Workflow
大语言模型
icon
Search documents
NeurIPS 2025最佳论文开奖,何恺明、孙剑等十年经典之作夺奖
3 6 Ke· 2025-11-27 07:27
Core Insights - NeurIPS 2025 announced its best paper awards, with four papers recognized, including a significant contribution from Chinese researchers [1][2] - The "Test of Time Award" was given to Faster R-CNN, highlighting its lasting impact on the field of computer vision [1][50] Best Papers - The first best paper titled "Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)" was authored by a team from multiple prestigious institutions, including Washington University and Carnegie Mellon University [5][6] - The second best paper, "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free," involved collaboration between researchers from Alibaba, Edinburgh University, Stanford University, MIT, and Tsinghua University [14][15] - The third best paper, "1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities," was authored by researchers from Princeton University and Warsaw University of Technology [21][24] - The fourth best paper, "Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training," was a collaborative effort from PSL University and Bocconi University [28][29] Runners Up - Three runner-up papers were also recognized, including "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?" from Tsinghua University and Shanghai Jiao Tong University [33][34] - Another runner-up paper titled "Optimal Mistake Bounds for Transductive Online Learning" was authored by researchers from Kent State University, Purdue University, Google Research, and MIT [38][39] - The third runner-up paper, "Superposition Yields Robust Neural Scaling," was from MIT [42][46] Test of Time Award - The "Test of Time Award" was awarded to the paper "Faster R-CNN," which has been cited over 56,700 times and has significantly influenced the computer vision field [50][52] - The paper introduced a fully learnable two-stage process that replaced traditional methods, achieving high detection accuracy and near real-time speeds [50][52]
月之暗面公开强化学习训练加速方法:训练速度暴涨97%,长尾延迟狂降93%
量子位· 2025-11-27 04:34
Core Viewpoint - The article discusses the introduction of a new acceleration engine called Seer, developed by Moonlight and Tsinghua University, which significantly enhances the reinforcement learning (RL) training speed of large language models (LLMs) without altering the core training algorithms [1][8]. Summary by Sections Performance Improvement - Seer can improve the rollout efficiency of synchronous RL by 74% to 97% and reduce long-tail delays by 75% to 93% [3][23]. Technical Architecture - Seer consists of three main modules: 1. **Inference Engine Pool**: Built on DRAM/SSD, it includes multiple inference instances and a global KVCache pool for load balancing and data reuse [9]. 2. **Request Buffer**: Acts as a unified entry for all rollout requests, managing metadata and request states for precise resource scheduling [10]. 3. **Context Manager**: Maintains context views for all requests and generates scheduling decisions based on context signals [11]. Key Technologies - **Divided Rollout**: This technique breaks down responses into independent requests and segments, reducing memory fluctuations and load imbalance [12][13]. - **Context-Aware Scheduling**: Implements a "speculative request" strategy to prioritize obtaining length features for requests, thus alleviating long request delays [17]. - **Adaptive Grouped Speculative Decoding**: Utilizes similar response patterns within groups to create a dynamic reference library for generating drafts, enhancing decoding efficiency [19]. Experimental Validation - In experiments with models like Moonlight, Qwen2-VL-72B, and Kimi-K2, Seer demonstrated a throughput increase of 74% to 97% compared to the baseline system veRL, with significantly reduced long-tail delays [21][23]. - For instance, in the Moonlight task, the last 10% of requests took 3984 seconds with veRL, while Seer reduced this to 364 seconds, achieving an 85% reduction in long-tail delays [23]. Financing and Future Plans - Moonlight is reportedly nearing completion of a new funding round, potentially raising several hundred million dollars, which could elevate its valuation to $4 billion [32][33]. - The company is in discussions with investment firms, including IDG Capital and existing shareholder Tencent, with plans to complete the funding by the end of the year and initiate an IPO process in the following year [36][37].
中山大学最新Cell子刊:AI能够帮助医生克服技术障碍,但存在依赖风险
生物世界· 2025-11-27 04:11
撰文丨王聪 编辑丨王多鱼 排版丨水成文 近年来,促进生物学、化学、物理学、材料科学、计算机科学和工程学等不同科学领域合作的 跨学科研究 ,推动了众多科学领域的突破,并开辟了新的增长途径。例如,在数字医学领域,临床实践、计算机科学 及其他学科的知识和技术的融合,极大地推动了医疗保健服务的提升、患者参与度的提高、临床结果的改 善以及医疗保健系统的优化。 然而,尽管诸如 人工智能 (AI) 之类的技术在生物医学领域展现出巨大的应用潜力,但其广泛应用却因技 术障碍而受到极大限制。医生能够提供宝贵的临床见解和第一手经验,但由于缺乏必要的多学科专业知识 或技能,以及难以获得工程技术人员的支持,他们在涉及 AI 技术的问题导向型研究中的参与可能会受到极 大阻碍。对于那些身处偏远医院或大学的小型研究团队或临床团队,以及那些难以获取研究资源、跨学科 合作和技术支持的年轻医生来说,这一挑战尤为明显。 2025 年 11 月 26 日,中山大学中山眼科中心 林浩添 教授团队在 Cell 子刊 Cell Reports Medicine 上发 表了题 为: The effectiveness of large language mo ...
NeurIPS 2025奖项出炉,Qwen获最佳论文,Faster R-CNN获时间检验奖
机器之心· 2025-11-27 03:00
Core Insights - The NeurIPS 2025 conference awarded four Best Paper awards and three Best Paper Runner-up awards, highlighting significant advancements in various AI research areas [1][4]. Group 1: Best Papers - Paper 1: "Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)" discusses the limitations of large language models in generating diverse content and introduces Infinity-Chat, a dataset with 26,000 diverse user queries for studying model diversity [5][6][9]. - Paper 2: "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free" reveals the impact of gated attention mechanisms on model performance and stability, demonstrating significant improvements in the Qwen3-Next model [11][16]. - Paper 3: "1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities" shows that increasing network depth to 1024 layers can enhance performance in self-supervised reinforcement learning tasks, achieving performance improvements of 2x to 50x [17][18]. - Paper 4: "Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training" identifies mechanisms that prevent diffusion models from memorizing training data, establishing a link between training dynamics and generalization capabilities [19][21][22]. Group 2: Best Paper Runner-Up - Paper 1: "Optimal Mistake Bounds for Transductive Online Learning" solves a 30-year-old problem in learning theory, establishing optimal mistake bounds for transductive online learning [28][30][31]. - Paper 2: "Superposition Yields Robust Neural Scaling" argues that representation superposition is the primary mechanism governing neural scaling laws, supported by multiple experiments [32][34]. Group 3: Special Awards - The Time-Tested Award was given to the paper "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," recognized for its foundational impact on modern object detection frameworks since its publication in 2015 [36][40]. - The Sejnowski-Hinton Prize was awarded for the paper "Random synaptic feedback weights support error backpropagation for deep learning," which contributed significantly to understanding biologically plausible learning rules in neural networks [43][46][50].
a16z前合伙人重磅科技报告:AI如何吞噬世界
Hua Er Jie Jian Wen· 2025-11-26 12:08
Core Insights - Generative AI is initiating a significant platform shift in the tech industry, comparable to past transitions every 10 to 15 years, with the launch of ChatGPT in 2022 marking a potential starting point for this change [1][4][5] Investment Trends - Major tech companies, including Microsoft, AWS, Google, and Meta, are projected to invest $400 billion in AI infrastructure by 2025, surpassing the global telecom industry's annual investment of approximately $300 billion [4][11] - This projected investment for 2025 has nearly doubled within a year, indicating a rapid escalation in capital allocation towards AI [14] Historical Context of Platform Shifts - The tech industry has historically undergone platform shifts, such as from mainframes to PCs and from the web to smartphones, often leading to the decline of early leaders like Microsoft and Apple [5][11] - The report highlights that early leaders often disappear during these transitions, as evidenced by Microsoft's operating system market share dropping from nearly 100% to below 20% by 2025 [5] Current State of AI Development - Despite significant investment, the exact form of the current platform shift towards generative AI remains unclear, with various potential user interface paradigms being explored [10] - The construction of data centers in the U.S. is outpacing that of office buildings, driven by the new investment cycle in AI [17] Market Dynamics and Competition - The performance gap among leading large language models is narrowing, suggesting that these models may be becoming commoditized, which could lead to a reshuffling of value capture in the market [23] - Companies must seek new competitive advantages in areas such as computational scale, vertical data, product experience, or distribution channels [26] User Engagement Challenges - Despite claims of 800 million weekly active users for ChatGPT, actual user engagement is low, with only about 10% of U.S. users utilizing AI chatbots daily [27][30] - The report identifies a significant gap between technological capability and practical application, with many enterprises still slow to deploy AI solutions [33][36] Transformative Potential in Advertising - AI is expected to revolutionize advertising and recommendation systems by understanding user intent rather than relying solely on relevance, potentially rewriting the mechanisms of the trillion-dollar advertising market [37] Future Outlook - The future of AI is characterized by both clarity and ambiguity; while it is expected to reshape industries, the final product forms and value chain leaders remain uncertain [44] - The shift towards capital-intensive competition is evident, as companies like Microsoft increase their capital expenditure relative to sales, reflecting a fundamental change in competitive dynamics [45]
“AI主流发展路线已经遇到瓶颈”
第一财经· 2025-11-26 09:52
Core Insights - The main argument presented by Ilya Sutskever is that the current mainstream AI development path has reached a bottleneck, marking the end of the scaling era and a return to a research-focused paradigm [4][5]. Group 1: AI Development Phases - Sutskever identifies three phases in AI research: from 2012 to 2020 was the research era, from 2020 to 2025 was the scaling era, and now the field is transitioning back to a research era due to diminishing returns from scaling [4]. - He emphasizes that while computational power has increased significantly, it no longer guarantees better performance, leading to a blurred line between scaling and computational waste [4]. Group 2: Generalization and Model Limitations - A fundamental issue in the pursuit of AGI is the poor generalization ability of large models compared to humans [5]. - Sutskever points out that current models perform well on various evaluations but often make simple mistakes, suggesting that the training data may be too narrow, which disconnects evaluation performance from real-world performance [6]. Group 3: Emotional Intelligence in AI - Sutskever proposes that current AI may lack emotional intelligence, which could serve as a guiding value function, essential for effective decision-making [7]. - He draws parallels with humans who have lost emotional processing abilities, indicating that emotions play a crucial role in decision-making and could be a missing element in AI development [7]. Group 4: Alternative Perspectives in AI - Yann LeCun, a Turing Award winner, criticizes the limitations of large language models (LLMs), arguing they cannot perform complex reasoning and are merely statistical models [8]. - LeCun advocates for "world models" that learn from visual information, akin to how young animals learn, as a more promising direction for AI development [8][9]. - Fei-Fei Li also emphasizes the importance of building world models that can understand spatial relationships and interactions, suggesting a need for a new AI paradigm that incorporates generative, multimodal, and interactive capabilities [9]. Group 5: Industry Consensus - There is a lack of consensus in the AI industry regarding the future direction, but it is clear that the era of merely increasing computational power is over, necessitating a reevaluation of the paradigms that will lead to AGI [9].
小米大模型首曝光:参数规模为64亿 在CMMLU中文向大模型排名第1
Xin Lang Ke Ji· 2025-11-26 08:25
在今年的一季度财报中,小米表示,2023年4月,小米集团正式组建AI实验室大模型团队。目前小米AI 领域相关研发人员超1200人。 近日,小米的大语言模型MiLM-6B首次现身C-Eval、CMMLU两大AI模型评测榜单。 资料显示,MiLM-6B是由小米开发的一个大规模预训练语言模型,参数规模为64亿。截至当前, MiLM-6B在C-Eval总榜单排名第10、同参数量级排名第1,在CMMLU中文向大模型排名第1。 ...
WPS 365升级为全球一站式AI协同办公平台 年底将推出国际版
Zheng Quan Ri Bao· 2025-11-26 08:09
Core Insights - Kingsoft Office has upgraded WPS 365 to a global one-stop AI collaboration platform, introducing new products such as WPS Lingxi Enterprise Edition and Team Space, aiming to cover all mainstream platforms globally [3] - The WPS 365 platform integrates various tools including messaging, documents, meetings, emails, and smart document libraries, achieving unified access, integration, data, and control to maximize organizational efficiency [1][3] - The AI middle platform of WPS 365 has been applied in multiple industries, enhancing document intelligence capabilities for tasks such as smart retrieval and analysis of accumulated data [2] Product Features - WPS 365 will launch an international version by the end of the year, supporting cross-regional and cross-language global office collaboration, with compatibility with Microsoft 365 [1] - The upgraded smart document library utilizes technologies like OCR, LLM, and NLP to transform scattered documents into reusable knowledge [2] - The digital employee has been upgraded to version 2.0, serving as an intelligent agent based on the company's private knowledge, crucial for building an organizational understanding [2] Strategic Goals - The introduction of the AI middle platform aims to activate comprehensive knowledge within enterprises, enabling better decision-making through the integration of large model engines and proprietary knowledge [2] - Kingsoft Office's initiatives are positioned to address challenges faced by expanding organizations, such as increased internal systems and data leakage risks [1]
杨震原:2021 年字节团队曾训出大语言模型,但当时 “没眼光”
3 6 Ke· 2025-11-25 11:26
Core Insights - ByteDance has been actively exploring technology since its inception, focusing on large-scale machine learning systems for recommendation algorithms [1][5][34] - The company has made significant advancements in AI, particularly with its AI dialogue assistant "Doubao" and its leading position in the Chinese MaaS market through Volcano Engine [2][34] - ByteDance is investing heavily in XR technology, aiming to enhance user experience through improved hardware and software solutions [22][30] Group 1: Technology Development - In 2014, ByteDance set an ambitious goal to develop a recommendation system with a feature scale of one trillion, leveraging large-scale machine learning [5][9] - The company initially underestimated the potential of large language models, but quickly pivoted to invest in this area starting in 2022, leading to successful applications [34][35] - ByteDance has developed a stable training system called MegaScale, achieving a floating-point operation utilization rate exceeding 55%, which is 1.3 times higher than mainstream open-source frameworks [34] Group 2: AI and Machine Learning - The company has recognized the importance of large-scale data for creating valuable models and algorithms, particularly in the context of real-world applications [10][34] - ByteDance's AI dialogue assistant "Doubao" has become the most popular in China, showcasing the company's success in AI applications [2][34] - The company is also exploring advanced AI models, including the Seed Edge plan, which focuses on cutting-edge research in large models [35] Group 3: XR Technology - ByteDance acquired the Pico team in 2021 to enhance its XR capabilities, focusing on both content and foundational technology [22][30] - The company aims to achieve a pixel density (PPD) of nearly 4000, significantly higher than existing products, to improve clarity in XR experiences [26][29] - ByteDance is developing a dedicated consumer electronics chip to address processing bottlenecks in mixed reality applications, achieving a system latency of around 12 milliseconds [31][30]
第十六届IEEE云计算技术与科学国际会议落幕
Zhong Guo Xin Wen Wang· 2025-11-25 09:24
Core Insights - The 16th IEEE Cloud Computing Technology and Science International Conference (CloudCom2025) was recently held in Shenzhen, hosted by Shenzhen North University of Moscow, attracting over 200 top scholars, academicians, and industry experts to discuss advancements in cloud computing, edge computing, big data, and security privacy [1][2] Group 1: Key Presentations - Professor Abdallah Shami from Western University, Canada, delivered a keynote on "Automated Network Intelligence: Driving 5G and Future Development," emphasizing the critical role of artificial intelligence in the evolution of 5G and future networks [1] - Professor Xu Ke from Tsinghua University presented on "Secure Internet Architecture and Key Technologies," sharing forward-looking ideas for building safer and more reliable network architectures [1] - Academician Gong Jianya from Wuhan University discussed "Challenges and Thoughts on Intelligent Interpretation of Remote Sensing," highlighting the application and development trends of remote sensing technology in intelligent interpretation [1] - Academician Weihua Zhuang from the University of Waterloo focused on "6G Intelligent Network Management," exploring new opportunities and challenges in network management in the 6G era [1] Group 2: Additional Expert Contributions - The conference featured presentations from experts such as Professor Li Nan from the National University of Defense Technology, Professor Duan Lingjie from Hong Kong University of Science and Technology (Guangzhou), Professor Chen Jiachao from Sun Yat-sen University, and Professor Xu Ruifeng from Harbin Institute of Technology (Shenzhen), covering topics like 6G semantic communication, human-machine feedback learning, and AI applications in Web3 finance [2] - Over the three-day conference, multiple parallel sessions were held, addressing popular fields such as cloud scheduling optimization, federated edge learning, 5G and AI security, intelligent IoT, and large language models, discussing specific technical issues like emotion recognition, drone resource allocation, digital twins, and task offloading [2]