Qwen

Search documents
不调参、不费力,上海交大&上海AI Lab推出“记忆解码器”,任意LLM无缝自适应
3 6 Ke· 2025-08-26 09:17
当前,大语言模型(LLM)在医疗、金融、法律等专业领域,常因缺乏深度知识而表现较差,如何让 LLM 在不同特定领域中发挥最佳性能,仍是一大挑 战。 现有主流方案包括领域自适应预训练(DAPT)和检索增强生成(RAG)。然而,DAPT 需要进行耗时的全参数训练,且易产生灾难性遗忘,难以让多个 模型在同一领域中高效适配;而 RAG 也因昂贵的 kNN 搜索和更长的上下文,推理延迟大大增加。 而且,由于 RAG 的即插即用特性与 DAPT 的推理效率之间存在固有矛盾,开发既能跨模型适应,又能在部署时保持计算效率的解决方案,仍为空白。 为此,来自上海交通大学和上海AI Lab 的研究团队提出了一个"即插即用"的预训练记忆模块——"记忆解码器"(Memory Decoder),无需修改原模型参 数,即可适配不同尺寸模型,实现 LLM 的高效领域适应。 论文链接:https://arxiv.org/abs/2508.09874v1 Memory Decoder 的核心创新在于其"即插即用"的特性。经过训练后,单个 Memory Decoder 可无缝集成到任何使用相同 tokenizer 的 LLM 中,而无需进 行模型 ...
大模型能否为不同硬件平台生成高性能内核?南大、浙大提出跨平台内核生成评测框架MultiKernelBench
机器之心· 2025-08-25 02:48
当前,这些内核通常由开发者使用 CUDA、AscendC、Pallas 等硬件专用并行编程语言手工编写 —— 这要求开发者具备精湛的性能调优技巧,并对底层硬件架构有 深入理解。 在深度学习模型的推理与训练过程中,绝大部分计算都依赖于底层计算内核(Kernel)来执行。计算内核是运行在硬件加速器(如 GPU、NPU、TPU)上的 "小型 高性能程序",它负责完成矩阵乘法、卷积、归一化等深度学习的核心算子运算。 近年来,大语言模型(LLM)在代码生成领域的突破,使 "自动生成高性能深度学习内核" 成为新的研究热点。KernelBench、TritonBench 等评测基准相继出现, 主要聚焦于评估 LLM 在 NVIDIA GPU 内核生成上的表现。 已有研究表明,现有 LLM 已具备一定的 GPU 内核生成能力。例如,英伟达工程师基于 DeepSeek-R1 设计了一套工作流程,在简单的 CUDA 内核生成任务中,该 流程生成的内核在数值上全部正确,达到了 100% 的通过率。 然而,当前 AI 加速器架构日趋多样(如 NVIDIA GPU、华为昇腾 NPU、Google TPU、Intel GPU 等),其底 ...
中国:股市反弹和增长放缓背景下的政策制定(1)
2025-08-25 01:40
Asia Insights Global Markets Research Economics - Asia ex-Japan China: Policymaking amid a stock market rally and growth slowdown As economists we do not forecast stock markets, but we closely watch their impact on policymaking and economic activity. We believe that China's stock market rally since late September last year has been primarily driven by solid fundamentals. However, based on what previously happened, the boom has the potential to lead to a rise in irrational exuberance, an increase in leveragi ...
深度|Perplexity CEO:我们的目标是打造一个新的生态:一种“agent浏览器”的全新产品
Z Potentials· 2025-08-20 04:19
Core Insights - The article discusses the launch and capabilities of the Comet browser by Perplexity AI, aiming to create an AI operating system that enhances user productivity through automation and integration with various applications [3][9][10]. Group 1: Comet Browser Features - Comet is designed to handle asynchronous and repetitive tasks, providing a seamless user experience by integrating with existing applications like iMessage and email [4][5]. - The browser aims to act as a central hub for managing various digital tasks, allowing users to automate workflows and access information across different platforms [9][10]. - The concept of "context engineering" is introduced, emphasizing the need for AI to autonomously gather and utilize context from various communication tools to enhance user efficiency [5][6]. Group 2: AI and User Interaction - The discussion highlights the importance of achieving a natural and fluid interaction between AI and users, focusing on both intelligence and contextual understanding [6][4]. - The browser is positioned as a next-generation tool that can evolve with advancements in AI models, enhancing its capabilities over time [8][9]. - The potential for AI to automate digital labor is compared to autonomous driving, suggesting that AI can free up time for users by handling complex tasks [4][6]. Group 3: Market Position and User Adoption - Since its launch, Comet has seen a steady increase in user adoption, with a waitlist nearing one million, indicating strong market interest despite its early-stage development [9][10]. - The company aims to create a new category of "agent browsers," differentiating itself from traditional browsers and focusing on building a unique ecosystem [9][10]. - The competitive landscape is discussed, with the expectation that larger players like OpenAI and Google will also enter the agent browser space, further validating the concept [9][10]. Group 4: Challenges and Future Directions - The article addresses the technical challenges of building a robust infrastructure to support the complex interactions required for the Comet browser [28][29]. - There is an emphasis on the need for continuous improvement and adaptation to user feedback, with a focus on maintaining a high-quality user experience [29][34]. - The potential for future hardware development is mentioned, but the primary focus remains on refining the software capabilities of the browser [21][22][25].
最新综述!扩散语言模型全面盘点~
自动驾驶之心· 2025-08-19 23:32
Core Viewpoint - The article discusses the competition between two major paradigms in generative AI: Diffusion Models and Autoregressive (AR) Models, highlighting the emergence of Diffusion Language Models (DLMs) as a potential breakthrough in the field of large language models [2][3]. Group 1: DLM Advantages Over AR Models - DLMs offer parallel generation capabilities, significantly improving inference speed by achieving a tenfold increase compared to AR models, which are limited by token-level serial processing [11][12]. - DLMs utilize bidirectional context, enhancing language understanding and generation control, allowing for finer adjustments in output characteristics such as sentiment and structure [12][14]. - The iterative denoising mechanism of DLMs allows for corrections during the generation process, reducing the accumulation of early errors, which is a limitation in AR models [13]. - DLMs are naturally suited for multimodal applications, enabling the integration of text and visual data without the need for separate modules, thus enhancing the quality of joint generation tasks [14]. Group 2: Technical Landscape of DLMs - DLMs are categorized into three paradigms: Continuous Space DLMs, Discrete Space DLMs, and Hybrid AR-DLMs, each with distinct advantages and applications [15][20]. - Continuous Space DLMs leverage established diffusion techniques from image models but may suffer from semantic loss during the embedding process [20]. - Discrete Space DLMs operate directly on token levels, maintaining semantic integrity and simplifying the inference process, making them the mainstream approach in large parameter models [21]. - Hybrid AR-DLMs combine the strengths of AR models and DLMs, balancing efficiency and quality for tasks requiring high coherence [22]. Group 3: Training and Inference Optimization - DLMs utilize transfer learning to reduce training costs, with methods such as initializing from AR models or image diffusion models, significantly lowering data requirements [30][31]. - The article outlines three main directions for inference optimization: parallel decoding, masking strategies, and efficiency technologies, all aimed at enhancing speed and quality [35][38]. - Techniques like confidence-aware decoding and dynamic masking are highlighted as key innovations to improve the quality of generated outputs while maintaining high inference speeds [38][39]. Group 4: Multimodal Applications and Industry Impact - DLMs are increasingly applied in multimodal contexts, allowing for unified processing of text and visual data, which enhances capabilities in tasks like visual reasoning and joint content creation [44]. - The article presents various case studies demonstrating DLMs' effectiveness in high-value vertical applications, such as code generation and computational biology, showcasing their potential in real-world scenarios [46]. - DLMs are positioned as a transformative technology in industries, with applications ranging from real-time code generation to complex molecular design, indicating their broad utility [46][47]. Group 5: Challenges and Future Directions - The article identifies key challenges facing DLMs, including the trade-off between parallelism and performance, infrastructure limitations, and scalability issues compared to AR models [49][53]. - Future research directions are proposed, focusing on improving training objectives, building dedicated toolchains, and enhancing long-sequence processing capabilities [54][56].
大模型如何推理?斯坦福CS25重要一课,DeepMind首席科学家主讲
机器之心· 2025-08-16 05:02
Core Insights - The article discusses the insights shared by Denny Zhou, a leading figure in AI, regarding the reasoning capabilities of large language models (LLMs) and their optimization methods [3][4]. Group 1: Key Points on LLM Reasoning - Denny Zhou emphasizes that reasoning in LLMs involves generating a series of intermediate tokens before arriving at a final answer, which enhances the model's strength without increasing its size [6][15]. - The challenge lies in the fact that reasoning-based outputs often do not appear at the top of the output distribution, making standard greedy decoding ineffective [6]. - Techniques such as chain-of-thought prompting and reinforcement learning fine-tuning have emerged as powerful methods to enhance LLM reasoning capabilities [6][29]. Group 2: Theoretical Framework - Zhou proposes that any problem solvable by Boolean circuits can be addressed by generating intermediate tokens using a constant-sized transformer model, indicating a theoretical understanding of reasoning [16]. - The importance of intermediate tokens in reasoning is highlighted, as they allow models to solve complex problems without requiring deep architectures [16]. Group 3: Decoding Techniques - The article introduces the concept of chain-of-thought decoding, which involves checking multiple generated candidates rather than relying on a single most likely answer [22][27]. - This method requires programming effort but can significantly improve reasoning outcomes by guiding the model through natural language prompts [27]. Group 4: Self-Improvement and Data Generation - The self-improvement approach allows models to generate their own training data, reducing reliance on human-annotated datasets [39]. - The concept of reject sampling is introduced, where models generate solutions and select the correct steps based on achieving the right answers [40]. Group 5: Reinforcement Learning and Fine-Tuning - Reinforcement learning fine-tuning (RL fine-tuning) has gained attention for its ability to enhance model generalization, although not all tasks can be validated by machines [42][57]. - The article discusses the importance of reliable validators in RL fine-tuning, emphasizing that the quality of machine-generated training data can sometimes surpass human-generated data [45][37]. Group 6: Future Directions - Zhou expresses anticipation for breakthroughs in tasks that extend beyond unique, verifiable answers, suggesting a shift in focus towards building practical applications rather than solely addressing academic benchmarks [66]. - The article concludes with a reminder that simplicity in research can lead to clearer insights, echoing Richard Feynman's philosophy [68].
解读中国互联网-人工智能模型升级、年度经常性收入(ARR)趋势及对芯片供应的关注;7 月应用活跃度良好-Navigating China Internet_ Top AI_apps tracker_ AI model upgrades, ARR trends and focus on chip supply; healthy July app engagement
2025-08-14 01:36
Summary of Key Points from the Conference Call Industry Overview - The conference call focuses on the **China Internet** industry, particularly the **AI applications** sector and its dynamics in July 2025, highlighting trends in **cloud service providers (CSP)** and **AI model performance**. Core Insights and Arguments 1. **Chip Supply Dynamics**: - The evolving dynamics of Nvidia's H20 chip supply are crucial, with potential resumption of chip sales to China being discussed. This could lead to a significant increase in CSP capital expenditures (capex), projected to rise by **42% quarter-over-quarter in 3Q25** from a likely low in 2Q25 [1][1][1]. 2. **AI Model Launches**: - Continued launches of foundation models are noted, with performance gaps between US and Chinese models narrowing. OpenAI's GPT-5 launch is mentioned, but new models from Chinese platforms like Zhipu's GLM-4.5 and Alibaba's Qwen are showing competitive performance [1][1][1]. 3. **Annual Recurring Revenue (ARR) Trends**: - Monthly ARR trends for popular AI video generation models are highlighted, with **80% of China's AI ARR generated from overseas**, despite only capturing **5% of the total global AI applications revenue**. Key applications include video generation and image editing [1][1][1]. 4. **Engagement Trends**: - There is a noted **6% month-over-month decline** in engagement for consumer-facing AI chatbots in July, attributed to increased integration of AI functions into super-apps. Specific apps like DeepSeek and Doubao saw declines of **10% and 13% month-over-month**, respectively [1][1][1]. 5. **Enterprise AI Adoption**: - The adoption of AI by Chinese enterprises is accelerating, with token usage increasing by **404% and 284% year-over-year** for AI-native apps and in-app AIs, respectively. Notably, **66% of the top 30 AI apps** are developed by major internet companies: Alibaba, Baidu, ByteDance, and Tencent [6][6][6]. 6. **Mobile App Engagement**: - Overall engagement across the top 400 mobile apps increased by **6% year-over-year** in July 2025, with significant growth in Weixin and Douyin app engagement, which grew by **6% and 19% year-over-year**, respectively [7][7][7]. 7. **E-commerce and Local Services**: - E-commerce engagement grew by **14% year-over-year**, with JD and Taobao showing strong growth rates of **76% and 11% year-over-year**. Local services engagement also accelerated to **18% year-over-year** [11][11][11]. 8. **Gaming Engagement**: - Gaming engagement increased by **3% year-over-year** in July, with specific titles like Tencent's DnF mobile maintaining stable time spent shares [10][10][10]. Additional Important Insights - The report emphasizes a more defensive investment strategy due to weaker profit setups in transaction platforms, particularly in e-commerce and local services [10][10][10]. - The competitive landscape for AI applications is evolving, with significant implications for gaming and video generation due to advancements in multi-modal AI models [1][1][1]. - The report includes detailed statistics on the performance of various AI applications, highlighting the competitive positioning of companies like Kuaishou and ByteDance in the AI video generation space [36][36][36]. This summary encapsulates the key points discussed in the conference call, providing insights into the current state and future outlook of the China Internet and AI applications industry.
Thinking of Buying Alibaba Stock? Here's 1 Green Flag and 1 Red Flag.
The Motley Fool· 2025-08-10 08:25
Core Viewpoint - Alibaba is undergoing a significant transformation, focusing on artificial intelligence (AI) and cloud computing to redefine its growth story amidst challenges in its core e-commerce business [1][14]. Group 1: AI and Cloud Strategy - Alibaba is transitioning from being solely an e-commerce platform to becoming an AI-native enterprise, with Alibaba Cloud at the center of this shift [4]. - Alibaba Cloud has repositioned itself around AI, integrating with Qwen, its open-source large language model (LLM), which enhances its capabilities beyond traditional cloud services [5][6]. - The open-source strategy for Qwen allows developers to build their own AI applications, positioning Alibaba Cloud to expand into emerging markets and Southeast Asia [7]. - Alibaba plans to invest approximately $50 billion in core infrastructure over the next three years, surpassing its total AI and cloud spending in the past decade, indicating a strong commitment to becoming a leading AI cloud provider [8]. - If successful, AI and cloud computing could serve as Alibaba's primary growth drivers for the next decade, similar to how AWS drives growth for Amazon [9]. Group 2: E-commerce Challenges - Despite the focus on AI, Alibaba's core revenue still heavily relies on domestic commerce, which accounted for 45% of revenue and 113% of adjusted earnings before interest, taxes, and amortization (EBITA) in fiscal year 2025 [10]. - Revenue growth in the e-commerce segment is sluggish, with Taobao and Tmall revenue increasing only 3% in fiscal year 2025 due to weak consumer sentiment and intense competition from rivals like Pinduoduo and Douyin [11]. - Alibaba is attempting to enhance its shopping experiences with AI and reengage merchants and users, resulting in a 9% year-over-year growth in domestic e-commerce revenue in the March 2025 quarter [12]. - Sustaining this momentum is crucial, as structural pressures from competition and shifts in consumer behavior remain significant challenges [13]. Group 3: Investment Implications - Alibaba is at a crossroads, balancing long-term success through AI and cloud initiatives with ongoing challenges in its e-commerce business [14]. - Investors seeking short-term growth may find better opportunities elsewhere, while those willing to wait for the AI strategy to materialize may see potential in Alibaba [15].
2025开放计算技术大会|开源开放推动系统创新 加速AIDC全球协作
Sou Hu Cai Jing· 2025-08-09 06:37
Core Insights - The 2025 Open Computing Technology Conference held in Beijing focuses on the development trends of MoE large models and AI agents, emphasizing the importance of open computing in enhancing both vertical scaling and horizontal efficiency [1] - Open-source large models are reshaping the global AI industry landscape, significantly lowering the barriers for enterprises and individual developers to access advanced AI capabilities, thus accelerating the shift from closed to open collaboration [3] - The rise of open computing is fostering tighter collaboration within the data center industry chain, which is crucial for the rapidly evolving AI sector [4] Industry Developments - The MoE large models are experiencing rapid growth in parameter counts, necessitating innovations in computing architecture to meet the extreme demands for computing density and interconnect speed [4] - The power requirements for AI data centers are projected to escalate from over 100 kW per cabinet to above 1 MW, indicating a shift towards GW-level power demands [4] - China has a significant advantage in energy infrastructure, particularly in renewable energy, with 90% of new installations in Q1 2025 coming from renewable sources, contributing to 35.9% of the total power generation [5] Collaborative Efforts - The establishment of the "GW-level Open Intelligent Computing Center OCP China Community Group" aims to leverage China's strengths in energy and computing infrastructure to promote the implementation of AI open system strategies [5] - OCP is actively collaborating with OCTC to explore the deployment of advanced AI infrastructure technologies and research outcomes in the Chinese market [5] - Future initiatives will focus on creating a global open-source coalition to enhance collaboration among developers across different countries and regions, promoting innovation and integration within the global supply chain [6]
时隔六年,OpenAI 为什么再次开源?
Founder Park· 2025-08-06 14:00
Core Viewpoint - OpenAI's release of the open-source model gpt-oss marks a significant strategic shift, indicating a clearer understanding of its value proposition beyond just the model itself, focusing on its user base and application ecosystem [2][4][13]. Group 1: OpenAI's Open-Source Model Release - OpenAI has launched its first open-source model, gpt-oss, since GPT-2, with performance comparable to its proprietary o4 mini model while reducing costs by at least 10 times [2][10]. - The gpt-oss-120b model achieved a score of 90.0 on the MMLU benchmark, while the gpt-oss-20b scored 85.3, indicating competitive performance in the open-source landscape [3][8]. - The models are designed to run efficiently on various hardware, from consumer-grade GPUs to cloud servers, and are licensed under Apache 2.0, allowing for commercial deployment without downstream usage restrictions [7][8]. Group 2: Strategic Implications - OpenAI's move to open-source is not merely a technical sharing but aims to build an application ecosystem, targeting enterprises looking to deploy open-source AI models [5][12]. - The release reflects OpenAI's recognition that its core competitive advantage lies in its large user base and application ecosystem rather than just the models themselves [4][13]. - OpenAI's decision to avoid releasing training data, code, or technical reports suggests a strategy to attract businesses while potentially impacting academic research and the true open-source AI community [19][22]. Group 3: Competitive Landscape - The introduction of gpt-oss is expected to challenge existing API products, with OpenAI positioning itself aggressively in the market by offering a model that significantly undercuts the cost of its proprietary offerings [10][11]. - The architecture of gpt-oss aligns with industry trends towards sparse MoE models, indicating a shift in design preferences within the AI community [14]. - The competitive landscape is evolving, with OpenAI's release potentially reversing the previous lag in open-source model applications compared to Chinese counterparts [21][22]. Group 4: Future Considerations - The open-source model's ecosystem remains chaotic, with high-scoring models not necessarily being user-friendly, which could slow adoption rates [17][18]. - OpenAI's approach to model safety and fine-tuning raises questions about the balance between usability and security, which will need community validation [15][16]. - The ongoing competition between U.S. and Chinese open-source models highlights the need for strategic actions to maintain relevance and leadership in the AI space [20][22].