Workflow
DeepSeek
icon
Search documents
刚刚,DeepSeek梁文锋NSA论文、北大杨耀东团队摘得ACL 2025最佳论文
3 6 Ke· 2025-07-31 03:40
Core Insights - The ACL conference, a leading event in computational linguistics and natural language processing (NLP), is set to take place in Vienna, Austria, from July 27 to August 1, 2025, marking its 63rd edition [1] - This year's conference saw a record number of submissions, exceeding 8,000 papers compared to 4,407 last year, with acceptance rates of 20.3% for main conference papers and 16.7% for findings [3] - Over half of the first authors of the submitted papers are from China (51.3%), a significant increase from 30.6% last year, while the second-largest group comes from the United States (14.0%) [3] Awards and Recognitions - A total of 4 best papers, 2 best social impact papers, 3 best resource papers, 3 best thematic papers, 26 outstanding papers, 2 best TACL papers, 1 best demo paper, and 47 SAC highlights were awarded this year [5] - The best paper awards were shared between teams from DeepSeek and Peking University, and other notable institutions including CISPA Helmholtz Center for Information Security, TCS Research, Microsoft, Stanford University, and Cornell Tech [8] Notable Papers - The paper "A Theory of Response Sampling in LLMs" explores the heuristic methods guiding sampling in large language models (LLMs) and highlights ethical concerns regarding decision-making biases [11] - "Fairness through Difference Awareness" introduces a framework for measuring group discrimination in LLMs, emphasizing the importance of group difference awareness in various contexts [13] - "Language Models Resist Alignment" reveals that large models possess an inherent elasticity mechanism that makes them resistant to alignment efforts, posing challenges for AI safety and alignment [16][17] - The paper "Native Sparse Attention" presents a new attention mechanism designed for efficient long-context modeling, demonstrating superior performance compared to existing sparse attention methods [24][28] Awards for Specific Papers - The best demo paper award went to "OLMoTrace," which can trace language model outputs back to trillions of training tokens, showcasing a significant advancement in understanding model behavior [32] - The best thematic paper award was given to "MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection," which proposes a new adaptive method for fine-tuning large models with minimal parameters [34] Lifetime Achievement and Service Awards - The ACL Lifetime Achievement Award was presented to Professor Kathy McKeown for her extensive contributions to the field of NLP over 43 years [57][60] - The Distinguished Service Award was awarded to Professor Julia B. Hirschberg for her long-standing service to ACL and contributions to the fields of NLP and speech processing [62]
DeepSeek下一代技术提前曝光,梁文锋署名论文获ACL2025最佳论文奖
量子位· 2025-07-30 23:56
Core Insights - The article highlights the groundbreaking achievement of a paper co-authored by DeepSeek's Liang Wenfeng and Peking University, which won the Best Paper Award at ACL 2025 [1] - The conference saw an unprecedented scale with a total submission of 8,360 papers, nearly doubling from last year's 4,407, indicating fierce competition [2] Technical Innovations - The proposed Native Sparse Attention (NSA) mechanism significantly enhances long text processing speed by 11 times through algorithm and hardware optimization, outperforming traditional full attention models [3][8] - The technology allows for an extension of context length up to 1 million tokens, set to be applied in next-generation models [4] - The NSA employs a dynamic hierarchical sparse strategy with three parallel attention branches: coarse-grained global information capture, selective attention for key segments, and sliding attention for local context [10][17] Performance Metrics - In practical tests, NSA demonstrated remarkable speed advantages across the entire lifecycle of processing 64k length sequences, with decoding speed improved by 11.6 times, forward propagation by 9 times, and backward propagation by 6 times [15][16] - The NSA pre-trained 27B parameter model surpassed the full attention baseline in 7 out of 9 evaluation metrics, particularly excelling in inference-related benchmarks [19][20] - In long text processing tests, NSA achieved perfect retrieval accuracy and outperformed the full attention baseline by 0.032 in the LongBench benchmark [21] Comparative Analysis - An experiment using DeepSeek-R1's mathematical reasoning data showed that NSA-R achieved an accuracy of 0.121 in an 8k context setting, significantly higher than the full attention model's 0.046 [22][23] - NSA also outperformed full attention in complex reasoning tasks, with improvements of 0.087 in HPQ and 0.069 in code understanding tasks [25] Additional Research Highlights - The article mentions three other best paper winners, including a study on the resilience of large language models post-alignment training, emphasizing the need for more effective alignment techniques [26] - Another paper explored fairness in large models through a new perspective of "difference awareness," revealing that traditional fairness tests may not adequately address the nuances of model behavior [28] - A third paper discussed the sampling mechanisms in large models, highlighting potential biases in decision-making processes that could lead to ethical concerns [29]
刚刚,DeepSeek梁文锋NSA论文、北大杨耀东团队摘得ACL 2025最佳论文
机器之心· 2025-07-30 16:25
Group 1 - The ACL conference is a premier event in the field of computational linguistics and natural language processing, with the 63rd edition scheduled for July 27 to August 1, 2025, in Vienna, Austria [2] - This year, the total number of submissions reached a record high of over 8,000, compared to 4,407 last year, with acceptance rates of 20.3% for main conference papers and 16.7% for Findings [3] - Over half of the first authors of the submitted papers are from China (51.3%), a significant increase from last year's 30.6%, while the second-largest group of authors comes from the United States at 14.0% [4] Group 2 - Four best papers were awarded, including two from teams led by Liang Wenfeng and Yang Yaodong from Peking University, with the other two awarded to teams from CISPA Helmholtz Center for Information Security & TCS Research & Microsoft, and Stanford University & Cornell Tech [6][10] - The first best paper discusses a theory of response sampling in large language models (LLMs), highlighting the ethical concerns arising from biases in decision-making processes influenced by LLMs [11][15] - The second best paper focuses on algorithmic fairness, introducing a framework that emphasizes group discrimination awareness in specific contexts, demonstrating that existing bias mitigation strategies may be counterproductive [16][19] Group 3 - The third best paper reveals a structural inertia mechanism in large models that resists alignment during fine-tuning, indicating that achieving robust alignment is more challenging than previously thought [24][25] - The fourth best paper presents a new hardware-aligned and natively trainable sparse attention mechanism, which significantly improves efficiency in long-context modeling for LLMs [31][40] Group 4 - A total of 26 outstanding papers were recognized, covering various topics such as multilingual summarization, hate speech analysis, and the evaluation of large language models [42] - The best demo paper was awarded to OLMoTrace, a system capable of tracing language model outputs back to trillions of training tokens [46][48] Group 5 - The ACL 2025 conference also recognized two time-tested awards, celebrating foundational papers from 2000 and 2015 that have significantly influenced the field [65][73] - Kathy McKeown received the Lifetime Achievement Award for her extensive contributions to natural language processing over 43 years [86][90] - Julia B. Hirschberg was awarded the Distinguished Service Award for her long-standing service to the ACL and contributions to the field [96][98]
国产AI算力的“阶跃”时刻
Guan Cha Zhe Wang· 2025-07-30 09:26
Core Insights - The event highlighted the collaboration among leading domestic computing chip companies and the launch of the new multi-modal reasoning model Step 3 by Jumpshare Star, showcasing the strong adaptability of domestic chips [3][5][12] - The establishment of the "Model-Chip Ecological Innovation Alliance" aims to synchronize product development among hardware manufacturers and enhance strategic cooperation [12][19] - Jumpshare Star's revenue guidance for the year is projected to reach 1 billion yuan, indicating a strong market position compared to competitors [13][14] Group 1: Model and Chip Integration - The Step 3 model demonstrates a 300% inference efficiency improvement on domestic chips compared to DeepSeek-R1, and over 70% improvement in distributed inference on NVIDIA Hopper architecture [6][8] - Jumpshare Star's approach integrates model development with hardware characteristics from the outset, addressing the inefficiencies of traditional development cycles [8][9] - The new multi-matrix factorization attention (MFA) architecture significantly reduces key-value cache usage by 93.7%, making it more compatible with domestic chips [11] Group 2: Market Position and Strategy - Jumpshare Star has released over ten multi-modal models in the past year, positioning itself favorably in a market where multi-modal applications are increasingly sought after [15][16] - The company has established significant partnerships with leading domestic smartphone manufacturers and automotive companies, enhancing its market reach [16] - The rapid application of multi-modal models is expected to create a feedback loop that drives further model improvements [16] Group 3: Shanghai's Role in AI Development - Shanghai hosts a significant number of AI companies, with 24,733 registered AI enterprises in 2024, reflecting a 5.1% growth from the previous year [18] - The city benefits from a robust industrial ecosystem, including major wafer fabs and advanced packaging capabilities, which support GPU companies [18][19] - Shanghai's state-owned capital is actively investing in AI startups, indicating strong governmental support for the industry [18]
DeepSeek冲刺北交所上市,未来5年战略投资算力租赁,构建AI基础设施生态
Sou Hu Cai Jing· 2025-07-30 07:50
Group 1 - DeepSeek, an AI unicorn, is set to initiate its IPO process on the Beijing Stock Exchange in November 2025, with a focus on computing power leasing as its core strategy for the next five years [1] - The company plans to invest 3 billion yuan in building a self-controlled high-performance computing (HPC) center and collaborate with domestic chip manufacturers to create customized AI computing power solutions [1] - DeepSeek has established strategic partnerships with domestic chip companies such as Huawei Ascend and Cambricon, aiming to support high computing power demand scenarios like large model training and autonomous driving simulation [3] Group 2 - Industry experts indicate that DeepSeek's listing will accelerate the localization of AI computing infrastructure in China and is expected to capture over 35% of the domestic market share within the next 3-5 years [3]
开源模型三城记
Hu Xiu· 2025-07-30 01:58
Core Insights - The article discusses the competitive landscape of AI in China, particularly focusing on the launch of new open-source models like GLM-4.5 by Zhiyu and the ongoing rivalry among cities like Beijing, Shanghai, and Hangzhou in the AI sector [1][19] - The emergence of open-source models is seen as a response to the U.S. AI action plan, with China aiming to accelerate the deployment of open-source AI globally [1][16] Group 1: Open-Source Model Developments - Zhiyu has released the GLM-4.5 model, which has a total parameter count of 355 billion and an active parameter count of 32 billion, showcasing significant performance capabilities [11] - Alibaba has introduced several models, including Qwen3-Coder with 480 billion total parameters, which is priced at one-third of its competitor Claude 4, indicating a strong push in the open-source domain [3][5] - The K2 model from the company Moonlight has implemented a self-criticism reward mechanism to enhance its ability to handle complex tasks, marking a significant innovation in the field [10] Group 2: Competitive Dynamics - The competition among AI startups in Shanghai and Beijing has intensified, with companies like MiniMax and Moonlight rapidly updating their models to keep pace with market demands [6][9] - The article highlights the "flywheel effect" initiated by DeepSeek, which has led to price wars and increased performance testing among open-source models [2] - The collaboration and competition among these cities are likened to a "three-city drama," emphasizing the regional rivalry in AI development [1][19] Group 3: Strategic Implications - The open-source approach is seen as a cultural shift for companies like DeepSeek, which aims to attract top talent and contribute to global innovation in AI [14] - Alibaba's strategy aligns with its cloud computing identity, focusing on technology-first approaches rather than purely commercial ones [13] - The article suggests that the open-source ecosystem in China could lead to rapid innovation and improvement, potentially surpassing proprietary models from the U.S. [17][19]
干货 | 基于深度强化学习的轨迹规划(附代码解读)
自动驾驶之心· 2025-07-29 23:32
Core Viewpoint - The article discusses the advancements and applications of reinforcement learning (RL) in the field of autonomous driving, highlighting its potential to enhance decision-making processes in dynamic environments. Group 1: Background and Concepts - The concept of VLA (Variational Learning Algorithm) and its relation to embodied intelligence is introduced, emphasizing its similarity to end-to-end autonomous driving [3] - Reinforcement learning has gained traction in various industries following significant milestones like AlphaZero in 2018 and ChatGPT in 2023, showcasing its broader applicability [3] - The article aims to explain reinforcement learning from a computer vision perspective, drawing parallels with established concepts in the field [3] Group 2: Learning Methods - Supervised learning in autonomous driving involves tasks like object detection, where a model is trained to map inputs to outputs using labeled data [5] - Imitation learning is described as a method where models learn actions by mimicking human behavior, akin to how children learn from adults [6] - Reinforcement learning differs from imitation learning by focusing on optimizing actions based on feedback from interactions with the environment, making it suitable for sequential decision-making tasks [7] Group 3: Advanced Learning Techniques - Inverse reinforcement learning is introduced as a method to derive reward functions from expert data, particularly useful when defining rewards is challenging [8] - The Markov Decision Process (MDP) is explained as a framework for modeling decision-making tasks, where states, actions, and rewards are interrelated [9] - Dynamic programming and Monte Carlo methods are discussed as techniques for solving reinforcement learning problems, emphasizing their role in optimizing decision-making processes [11][12] Group 4: Reinforcement Learning Algorithms - Various reinforcement learning algorithms are categorized, including on-policy and off-policy methods, highlighting their differences in training stability and data utilization [25][26] - The article outlines key algorithms such as Q-learning, SARSA, and policy gradient methods, explaining their mechanisms and applications in reinforcement learning [27][29] - Advanced algorithms like TRPO and PPO are presented, focusing on their strategies for ensuring stable training and optimizing policy updates [57][58] Group 5: Applications in Autonomous Driving - The importance of reward design in autonomous driving is emphasized, with safety, comfort, and efficiency being key factors [62] - The article discusses the need for closed-loop training systems in autonomous driving, where vehicle actions influence the environment, necessitating dynamic modeling of other vehicles [62] - The integration of end-to-end learning with reinforcement learning is highlighted as a method to adapt to changing environments in real-time [63]
21专访丨安永吴晓颖:AI医疗需从“炒概念”走向“真落地”
Core Insights - The healthcare sector is a testing ground for new technologies, with generative AI significantly enhancing medical services and accelerating drug development [1][3] - The 2025 World Artificial Intelligence Conference in Shanghai showcased over 800 companies and 3000 cutting-edge exhibits, highlighting the rapid advancements in AI technology [1][2] Industry Trends - AI is transforming the entire healthcare process, including health management, diagnosis, imaging analysis, drug development, and surgical robotics, leading to improved efficiency and patient experience [3] - The AI healthcare market is projected to grow from 97.3 billion yuan in 2023 to 159.8 billion yuan by 2028, indicating a positive future trend [3] Challenges in AI Healthcare - The industry faces significant challenges in moving from "technological feasibility" to "scalable application," including issues related to standardization, ecosystem fragmentation, and clinical translation [2][4] - Key barriers to commercialization include data privacy and compliance, clinical validation and payment models, operational capabilities, and interoperability within healthcare systems [4] Investment Landscape - Major tech companies like Tencent, Ant Group, and Huawei are increasingly focusing on the AI healthcare sector, indicating a shift from conceptualization to practical commercialization [3][4] - AI-native pharmaceutical companies are valued based on their model capabilities, computational efficiency, and data barriers, differing from traditional pharmaceutical valuation methods [5] Regulatory Environment - The FDA's recent initiatives, including the introduction of generative AI tools and the appointment of a Chief AI Officer, aim to modernize regulatory processes and enhance the integration of AI in drug approval [6][7] - Chinese pharmaceutical companies looking to enter international markets must adapt to regulatory requirements and ensure compliance with FDA standards [7] Data Utilization Strategies - AI-driven synthetic control arms and real-world data simulations are being recognized by the FDA as valid methods for accelerating international multi-center trial designs [8] - To address data standardization issues in emerging markets, companies should adopt international data models and utilize federated learning techniques to ensure data quality while maintaining patient privacy [8]
2025人工智能十大趋势
Sou Hu Cai Jing· 2025-07-29 16:39
Group 1 - The report titled "Coexistence Partners: Top 10 Trends in Artificial Intelligence for 2025" outlines significant trends in AI development, emphasizing the transition from "intelligent tools" to "coexistence partners" [1][7][26] - The three main themes identified are the evolution of foundational models, the rise of intelligent agents, and AI's integration into the physical world [1][7][21] Group 2 - The first trend highlights the breakthrough in reinforcement learning (RL), which is becoming a key force in enhancing the reasoning and action capabilities of large models, enabling them to solve complex scientific and engineering problems [2][36][39] - The second trend focuses on native multimodal generation, which allows AI to deeply integrate various data types such as images, speech, and text, facilitating seamless interaction across modalities [2][49][50] - The third trend discusses the evolution of voice models towards emotional intelligence, enabling AI to express context-aware emotional responses and enhancing human-machine interaction [2][3][48] Group 3 - The rise of intelligent agents is characterized by two main development paths: orchestration agents for complex task automation and end-to-end agents that internalize reasoning and planning capabilities [3][4][18] - The concept of LifeOS is emerging, where AI integrates user data to become a personalized digital self, enhancing user experience through long-term memory and personalized reasoning [3][4][19] - The trend of "intelligence as a service" is reshaping industry workflows, embedding AI deeply into sectors like healthcare, finance, and manufacturing [3][4][26] Group 4 - The report anticipates a "GPT-2 moment" for embodied intelligence in 2025, marking a significant leap from virtual computation to physical execution, with advancements in multimodal models and data engineering [4][6][21] - Spatial intelligence is evolving, allowing AI to process and understand three-dimensional environments, which opens new opportunities in fields like autonomous driving and robotics [4][20][21] - The commercialization of embodied intelligent robots is expected to accelerate, with companies like Tesla and Agility planning to produce around 1,000 units each for various applications [6][21][29] Group 5 - The overall trends indicate a shift towards AI becoming a true coexistence partner, with enhanced capabilities in reasoning, emotional understanding, and physical interaction, fundamentally changing human-AI relationships [7][21][26] - The report emphasizes the importance of building trust and collaboration with the next generation of AI, as it becomes more autonomous and capable [7][21][26]
我逛了两天WAIC 展,一些真实感受
佩妮Penny的世界· 2025-07-29 12:06
Core Insights - The WAIC 2023 showcased a significant increase in scale and attendance, with ticket prices reaching between 1300 to 3000 yuan on resale platforms, indicating high demand [1][2] - The exhibition area was highlighted as the main attraction, featuring numerous companies and their innovative projects, while the forum sessions were more accessible and less focused on technical depth [1][2] Group 1: Robotics and AI Innovations - The exhibition featured a wide variety of robots performing tasks such as sorting, making tea, and even playing mahjong, showcasing the advancements in robotics [3][10] - There is a growing interest in humanoid robots, as they are seen as the most adaptable to existing human infrastructure, with significant capital investment in the robotics sector [18][22] - The current period is described as a "species explosion" for robots, with substantial capital inflow, although the overall revenue from humanoid robots may not yet match the income from major AI events [22] Group 2: AI and Technology Companies - Major companies like Huawei, Alibaba, and Tencent showcased their AI capabilities, with Alibaba's integration of AI across various business lines being particularly noted [25][27] - The absence of companies like ByteDance and DeepSeek was mentioned, while other firms like Kuaishou presented their AI applications, indicating a diverse landscape of AI development [27][28] - The event also highlighted the emergence of AI glasses and low-altitude economy technologies, reflecting ongoing innovation in the sector [29][31] Group 3: Networking and Industry Trends - The exhibition served as a platform for networking, with many informal discussions and potential collaborations occurring after official hours [33] - The startup area was characterized by a more relaxed atmosphere, contrasting with larger companies, and was noted for its vibrant engagement [35] - The event underscored the importance of capturing opportunities in the rapidly evolving AI and robotics landscape, emphasizing the need for continuous engagement and exploration [33][39]