机器之心
Search documents
Nature | ApdativeNN:建模类人自适应感知机制,突破机器视觉「不可能三角」
机器之心· 2025-11-28 04:11
Core Insights - The article discusses the significant advancements in computer vision and the challenges faced in deploying high-precision models in resource-constrained environments, such as robotics and autonomous driving, due to increased computational demands and energy consumption [2][3]. - It highlights the limitations of existing global representation learning paradigms, which process all pixels of an image or video simultaneously, leading to inefficiencies in energy and computational resources [3]. - The article introduces the AdaptiveNN architecture, which emulates human-like adaptive vision by modeling visual perception as a sequential decision-making process, allowing for efficient and flexible machine visual perception [7][11]. Group 1: Challenges in Current Computer Vision Models - High-precision models require activation of millions of parameters, resulting in increased power consumption, storage needs, and response delays, making them difficult to deploy in real-world applications [2]. - The global parallel computation paradigm leads to a significant energy efficiency bottleneck, as the computational complexity grows with the input size, making it challenging to balance high-resolution input, performance, and efficient inference [3]. Group 2: Insights from Human Visual System - Human vision operates through selective sampling of key areas rather than processing all visual information at once, which significantly reduces computational overhead and allows for efficient functioning even in resource-limited scenarios [5]. - The concept of "active observation" proposed by researchers emphasizes the need for AI systems to adopt a human-like approach to visual perception, focusing on task-driven observation [5]. Group 3: Introduction of AdaptiveNN - AdaptiveNN architecture models visual perception as a multi-step sequential decision process, allowing the model to focus on specific areas of interest and accumulate information progressively [11]. - The architecture combines representation learning with self-rewarding reinforcement learning, enabling the model to optimize its attention and decision-making without additional supervision [15][16]. Group 4: Performance and Efficiency of AdaptiveNN - In extensive experiments, AdaptiveNN achieved up to 28 times reduction in inference costs while maintaining accuracy comparable to traditional static models, demonstrating its potential for efficient visual perception [7][22]. - The model's attention mechanism automatically focuses on discriminative regions, enhancing interpretability and aligning closely with human visual behavior [22][26]. Group 5: Broader Implications and Future Research - The findings from AdaptiveNN provide insights into cognitive science, particularly in understanding human visual behavior and the mechanisms behind visual decision-making [25]. - The architecture's application in embodied intelligence models shows significant improvements in reasoning and perception efficiency, suggesting a promising direction for future research in AI and cognitive science [29].
华为放出「准万亿级MoE推理」大招,两大杀手级优化技术直接开源
机器之心· 2025-11-28 04:11
机器之心报道 编辑:杜伟 2025 年已接近尾声,这一年里,大模型加速从单点提效工具升级为支撑业务系统的底层基础设施。过程中,推理效率决定了大模型能否真正 落地。对于超大规模 MoE 模型,复杂推理链路带来了计算、通信、访存等方面的挑战,亟需行业给出高效可控的推理路径。 华为亮出了面向准万亿参数 MoE 推理的完整技术栈:openPangu-Ultra-MoE-718B-V1.1 展现 MoE 架构的模型潜力、 包括 Omni Proxy 调度特 性、将昇腾硬件算力利用率推至 86% 的 AMLA 技术在内的昇腾亲和加速技术, 使得超大规模 MoE 模型具备了走向生产级部署的现实可行 性。开源实现: https://gitcode.com/ascend-tribe/ascend-inference-cluster# 如果说过去数年大模型竞争的焦点在训练规模与能力突破上,那么如今,推理效率正迅速成为影响模型能否落地的关键变量。 模型 GitCode 地址:https://ai.gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1-Int8 从任务属性来看, ...
学术圈炸了!ICLR评审大开盒,原来低分是好友打的
机器之心· 2025-11-28 00:51
机器之心报道 昨晚不知有多少人彻夜未眠。 北京时间 11 月 27 日晚,国内 AI 社区全数炸锅。在学术论文审稿最常用的 OpenReview 平台上,一个前端 bug 导致数据库泄露,让原本的双盲评审变 成了明牌。 这次的信息泄露方法简单到了极致: 只要在浏览器上输入某个网址,自行替换你要看的 paper ID 和审稿人编号,你就可以找到对应的任何审稿人的身份 。你可以知道是谁给你审的论文,知道他 / 她给你打了多少分。 因为没有操作门槛,在传播开来之后,所有人都瞬间切换到了调查模式,毕竟这年头谁还和审稿人没点摩擦,终于可以「有冤报冤,有仇报仇」了。 这一下子,就造就了无数惊喜、惊吓,愤怒与哀嚎。微信群里,小红书上,到处都是受害者在讲故事,有开人的也有被开的。你永远猜不到给你的论文打低 分的是谁。 机器之心编辑部 审稿人打低分的理由各不相同,有的是没能理解作者原意,有的是个人恩怨(比如组里兄弟互相打低分),更加可恶的是给低分从而给自己正在写的同赛道 论文「让路」。有人就利用这次泄露事件实锤了自己曾经被打 1 分的论文,审稿人竟然在五个月后提交了另一篇论文,又不愿意 cite 作者的投稿。 真正的 open ...
大模型作为评估者的「偏好」困境:UDA实现无监督去偏对齐
机器之心· 2025-11-28 00:51
Core Insights - The article discusses the issue of preference bias in large language models (LLMs) acting as judges, highlighting that even advanced models like GPT-4o and DeepSeek-V3 exhibit systematic favoritism towards their own outputs, leading to significant discrepancies in scoring and ranking [2][4][5] - The introduction of Unsupervised Debiasing Alignment (UDA) offers a new approach to address this bias by allowing models to autonomously adjust scoring rules through unsupervised learning, thus achieving debiasing alignment [2][7] Summary by Sections Problem Statement - Current LLM judging systems, such as Chatbot Arena, face three main challenges: self-preference solidification, heterogeneity bias, and static scoring defects [4][5] - Self-preference solidification leads to models overestimating their own answers, creating a scenario where "who judges wins" [4] - Heterogeneity bias results in varying directions and intensities of bias among different models, ranging from aggressive self-promotion to excessive humility [4] UDA Contribution - UDA transforms the debiasing problem into a sequence learning issue that can be optimized through dynamic calibration, allowing judges to explore optimal scoring strategies autonomously [7][25] - The method utilizes a consensus-driven training approach, treating the collective agreement of judges as a practical optimization target, which helps reduce overall bias [13][18] Methodology - UDA models pairwise evaluations as an instance-level adaptive process, dynamically generating adjustment parameters for each judge model during comparisons [10][11] - The system extracts multiple features from each comparison, including semantic feature vectors and self-perception features, which are crucial for detecting bias tendencies [11][20] Experimental Results - UDA significantly reduces inter-judge variance, lowering the average standard deviation from 158.5 to 64.8, demonstrating its effectiveness in suppressing extreme biases [23] - The average Pearson correlation with human evaluations improved from 0.651 to 0.812, indicating enhanced alignment with human judgment [23] - UDA shows robust zero-shot transfer capabilities, achieving a 63.4% variance reduction on unseen datasets, showcasing its domain-agnostic debiasing ability [23] Conclusion - UDA represents a shift in how judgment calibration is approached, moving away from prompt engineering to a learnable problem, enhancing the robustness and reproducibility of evaluations while aligning more closely with human judgment [25]
DeepSeek强势回归,开源IMO金牌级数学模型
机器之心· 2025-11-27 12:13
Core Insights - DeepSeek has released a new mathematical reasoning model, DeepSeek-Math-V2, which surpasses its predecessor, DeepSeek-Math-7b, in performance, achieving gold medal levels in mathematical competitions [5][21]. - The model addresses limitations in current AI mathematical reasoning by focusing on self-verification and rigorous proof processes rather than merely achieving correct final answers [7][25]. Model Development - DeepSeek-Math-V2 is based on the DeepSeek-V3.2-Exp-Base architecture and has shown improved performance compared to Gemini DeepThink [5]. - The previous version, DeepSeek-Math-7b, utilized 7 billion parameters and achieved performance comparable to GPT-4 and Gemini-Ultra [3]. Research Limitations - Current AI models often prioritize the accuracy of final answers, which does not ensure the correctness of the reasoning process [7]. - Many mathematical tasks require detailed step-by-step deductions, making the focus on final answers inadequate [7]. Self-Verification Mechanism - DeepSeek emphasizes the need for comprehensive and rigorous verification of mathematical reasoning [8]. - The model introduces a proof verification system that allows it to self-check and acknowledge its mistakes, enhancing its reliability [11][17]. System Design - The system consists of three roles: a proof verifier (teacher), a meta-verifier (supervisor), and a proof generator (student) [12][14][17]. - The proof verifier evaluates the reasoning process, while the meta-verifier checks the validity of the verifier's feedback, improving overall assessment accuracy [14]. Innovative Training Approach - The proof generator is trained to self-evaluate its solutions, promoting deeper reflection and correction of errors before finalizing answers [18]. - An honest reward mechanism encourages the model to admit mistakes, fostering a culture of self-improvement [18][23]. Automation and Evolution - DeepSeek has developed an automated process that allows the system to evolve independently, enhancing both the proof generator and verifier over time [20]. - The model's approach shifts from a results-oriented to a process-oriented methodology, focusing on rigorous proof examination [20]. Performance Metrics - DeepSeek-Math-V2 achieved impressive results in competitions, scoring 83.3% in IMO 2025 and 98.3% in Putnam 2024 [21][22]. - The model demonstrated near-perfect performance in the Basic benchmark of the IMO-ProofBench, achieving close to 99% accuracy [22]. Future Directions - DeepSeek acknowledges that while significant progress has been made, further work is needed to enhance the self-verification framework for mathematical reasoning [25].
生成式AI赋能需求工程:一场正在发生的变革
机器之心· 2025-11-27 12:13
Core Insights - The article presents a systematic literature review on the application of Generative AI (GenAI) in Requirements Engineering (RE), highlighting its transformative potential and the challenges that need to be addressed for effective industrial adoption [4][51]. Research Growth - Research on GenAI in the RE field has shown exponential growth, with the number of relevant papers increasing from 4 in 2022 to 23 in 2023, and projected to reach 113 in 2024 [10][8]. - A total of 238 papers were reviewed, indicating a strong academic interest following the release of ChatGPT [8][10]. Research Focus Imbalance - The focus of research is heavily skewed towards certain phases of RE, with 30% dedicated to requirements analysis, while only 6.8% is focused on requirements management, indicating a lack of attention to complex socio-technical factors [11][9]. - GenAI is currently in a "rapid expansion but immature" phase, with a significant increase in quantity but insufficient depth in research [14]. Technical Landscape - A significant reliance on the GPT model family is observed, with 67.3% of studies using it, which limits exploration of diverse technological paths [16]. - GPT-4 is primarily used for complex requirement analysis, while open-source alternatives like CodeLlama are underutilized despite their lower hallucination rates [17][16]. Challenges Identified - The research identifies three core challenges: reproducibility (66.8%), hallucination (63.4%), and interpretability (57.1%), which are interrelated and must be addressed collectively [30][31]. - The lack of reproducibility is particularly problematic due to the random nature of large language models (LLMs) and their opaque APIs [30]. Evaluation Practices - There is a notable lack of standardized evaluation metrics in the RE field, with only 23.9% of studies releasing tools and 45.8% using non-public datasets [35][37]. - Traditional NLP metrics dominate the evaluation methods, failing to capture the complexity of RE tasks [33]. Industrial Adoption - The industrial adoption of GenAI in RE is lagging, with 90.3% of studies remaining at the conceptual or prototype stage, and only 1.3% achieving production-level integration [39][41]. - The value of GenAI in industry is seen in accelerating requirement documentation and reducing communication costs, but companies are hesitant due to compliance and risk control concerns [43]. Future Roadmap - A four-phase strategy is proposed for advancing GenAI in RE: strengthening evaluation infrastructure, governance-aware development, scalable context-aware deployment, and industrial-level standardization [46]. - Key areas for improvement include generalization capabilities, data quality, and evaluation methods [45]. Recommendations for Researchers and Practitioners - Researchers are encouraged to explore diverse models beyond GPT, develop hybrid architectures specific to RE, and focus on reproducibility [53]. - Practitioners should use GenAI as an auxiliary tool rather than a decision-maker, especially in low-risk tasks [53].
聚焦AI青年成长|2025浦东国际人才港论坛·人工智能产业人才论坛报名启动
机器之心· 2025-11-27 10:23
Core Viewpoint - Artificial intelligence (AI) is recognized as a core driver of technological revolution and industrial transformation, becoming a key component of national competitiveness. The emphasis is on the role of young talents in AI, who are seen as crucial for solving technical challenges and promoting the integration of AI across various industries [2][4]. Event Overview - The "Artificial Intelligence Industry Talent Forum" will be held on December 6, focusing on the theme "Youth Empowerment, Intelligence Gathering in Pudong." The forum will bring together university professors, young scientists, entrepreneurial pioneers, and industry leaders to discuss topics such as AI talent cultivation, the evolution of embodied intelligence, and youth entrepreneurship [2][4]. - Pudong is positioned as the first national pilot zone for AI innovation applications, with the Zhangjiang AI Innovation Town at its core, aiming to build a global AI talent hub through policy and ecological construction [4]. Agenda Highlights - The forum will feature a series of events including: - Opening remarks by the host [5] - Introduction of the Zhangjiang AI Innovation Town [5] - Signing ceremony for youth talent entrepreneurial enterprises [5] - Keynote speech on cultivating top talents in the AI era by Wang Yanfeng, Executive Dean of the AI Institute at Shanghai Jiao Tong University [5] - Panel discussion on the evolution of embodied intelligence and ecological construction [5][6] - Dialogue among young talents on bridging theoretical advancements with practical applications [6] - Release of the AI industry talent development trend report by the Shanghai AI Industry Association and Shanghai Pudong Talent Development Co., Ltd. [6] Guest Profiles - Notable speakers include: - Wang Yanfeng, a prominent figure in AI research and education, focusing on the intersection of AI with media and healthcare [11]. - Tan Yinliang, a professor with extensive experience in AI and digital economy [12]. - Su Yang, co-founder and chief AI architect at Lingxin Qiaoshou, specializing in cutting-edge AI technology [13]. - Wang Hongtao, founder and CEO of Jingzhi Technology, with expertise in robotics and AI [14]. - Li Guanghui, founder and CEO of BraneMatrix AI, with a strong background in internet and security industries [17]. - Chen Yuanpei, a young talent in AI, known for his work in reinforcement learning [18]. - Liu Bang, an associate professor with a focus on natural language processing and embodied learning [19].
无问芯穹完成近5亿元A+轮融资,加码Agentic Infra基础设施建设,引领智能体产业变革
机器之心· 2025-11-27 10:23
Core Viewpoint - The article highlights the recent completion of a nearly 500 million RMB A+ round financing by Wunwen Chinqu, emphasizing the strong market confidence in its AI infrastructure capabilities and its alignment with national strategic initiatives [1][12]. Group 1: Financing and Investment - Wunwen Chinqu has successfully raised nearly 500 million RMB in its A+ round financing, led by Zhuhai Technology Group and Futen Capital, with participation from various other investors [1]. - The financing round reflects a combination of state-owned and market-driven investment, showcasing recognition of Wunwen Chinqu's commitment to technological innovation and its role in the AI industry [1][12]. Group 2: Company Overview and Offerings - Founded two and a half years ago, Wunwen Chinqu focuses on developing high-performance AI infrastructure, optimizing both software and hardware to address the computational bottlenecks in AI applications [3][6]. - The company has created the "Wunqiong AI Cloud" and "Wuyin Terminal Intelligent Solutions," serving numerous leading AI enterprises and research institutions [3][9]. Group 3: Future Directions and Strategic Focus - The new funding will be allocated towards enhancing Wunwen Chinqu's technological advantages, expanding AI cloud products and terminal solutions, and increasing investment in intelligent infrastructure development [5][6]. - Wunwen Chinqu aims to build a first-class intelligent service platform and supporting infrastructure to facilitate the widespread application of intelligent agents in both digital and physical worlds [5][6]. Group 4: Technological Innovations - The company has developed a comprehensive "Intelligent Agent Infrastructure" that integrates AI cloud and terminal intelligence, enabling significant advancements in the application of intelligent agents [7][9]. - Recent product launches include the Infra Agents for cloud infrastructure and Kernel Mind for terminal reasoning optimization, aimed at making intelligent agents a fundamental resource across various industries [11][12]. Group 5: Market Position and Vision - Wunwen Chinqu is positioned as a leader in AI infrastructure, with a strong emphasis on the integration of digital and physical worlds through intelligent agents [7][12]. - The company’s strategic focus on creating a robust ecosystem for intelligent agents aligns with national goals for developing a more autonomous AI industry [13].
当推荐系统真正「懂你」:快手团队在NeurIPS 2025提出新成果TagCF
机器之心· 2025-11-27 04:09
Core Insights - The article discusses the development of a new recommendation system framework called TagCF, which aims to enhance user understanding in addition to content understanding, moving from "knowing what" to "understanding why" [2][43]. Group 1: Research Background and Motivation - The research highlights a gap in traditional recommendation systems, which often focus solely on content without understanding user identities and roles [2][5]. - The TagCF framework was developed in collaboration with Kuaishou's algorithm team, the foundational model and application department, and Wuhan University [2][3]. Group 2: Methodology and Framework - TagCF introduces two new tasks: User Role Identification, which models user characteristics and social roles, and Behavioral Logic Modeling, which explores the logical relationships between user roles and item topics [9][10]. - The framework consists of three main modules: a video content understanding platform based on MLLM, a behavioral logic graph exploration platform, and a downstream recommendation system enhancement [16][18][22]. Group 3: Experimental Results - Experiments showed that user role-based modeling statistically outperformed traditional topic modeling, leading to more stable and effective recommendations [7][40]. - The TagCF framework demonstrated significant improvements in recommendation accuracy and diversity, with TagCF-it and TagCF-ut models achieving notable performance metrics [34][36]. Group 4: Challenges and Solutions - The implementation faced challenges such as uncontrolled tag expansion and the need for precise scoring mechanisms [23][24]. - Solutions included constructing a cover set of high-frequency tags to ensure stability and generalizability in industrial applications [25][41]. Group 5: Conclusion and Future Directions - The article concludes that the TagCF framework represents a significant advancement in recommendation systems by integrating user understanding with content understanding, thus bridging the gap between statistical and symbolic modeling [43][45]. - Future work will focus on refining the tag-logic system and exploring its applications across various business scenarios, including e-commerce and search [44][45].
Adam的稳+Muon的快?华为诺亚开源ROOT破解大模型训练「既要又要」的两难困境
机器之心· 2025-11-27 04:09
Core Viewpoint - The article discusses the evolution of optimizers in large language model (LLM) training, highlighting the introduction of ROOT (Robust Orthogonalized Optimizer) by Huawei Noah's Ark Lab as a solution that combines the speed of Muon and the stability of Adam, addressing the limitations of existing optimizers in handling large-scale training and noise robustness [2][50]. Group 1: Optimizer Evolution - The early optimizer, SGD (Stochastic Gradient Descent), established the basic paradigm for neural network training but struggled with convergence speed and stability in high-dimensional loss landscapes [6][7]. - Adam and AdamW emerged as the de facto standards for training deep learning models, significantly improving convergence efficiency but revealing numerical instability issues when model parameters exceed one billion [7][8]. - Muon, a matrix-aware optimizer, attempted to address these issues by treating weight matrices as a whole, yet it faced challenges related to robustness and sensitivity to noise [11][13]. Group 2: ROOT Optimizer Features - ROOT enhances the robustness of orthogonalized optimizers by introducing adaptive coefficients for the Newton-Schulz iteration, tailored to specific matrix dimensions, thus overcoming the dimensional fragility seen in fixed-coefficient methods [26][29]. - The optimizer employs a soft-thresholding mechanism to filter out gradient noise, effectively separating normal and abnormal gradient components, which improves stability during training [30][33]. - ROOT's design aims to provide a balance between speed and stability, making it suitable for large-scale, non-convex real model training scenarios [20][21]. Group 3: Performance Validation - In extensive experiments, ROOT demonstrated superior convergence capabilities, achieving a training loss of 2.5407 in a 10B token pre-training experiment, outperforming Muon [41][42]. - ROOT achieved an average score of 60.12 across multiple downstream tasks, surpassing both AdamW (59.05) and Muon (59.59), indicating its competitive edge [43]. - The optimizer also showed strong cross-modal generalization capabilities, achieving a Top-1 accuracy of 88.44% on the CIFAR-10 dataset, significantly higher than Muon's 84.67% [46][47]. Group 4: Future Implications - ROOT is positioned to potentially usher in a new era of optimizers, addressing the increasing complexity and scale of future language models, thereby enhancing the reliability and efficiency of AI system training [49][51]. - The open-source release of ROOT's code is expected to encourage further research and application in training trillion-parameter models, reinforcing Huawei's commitment to innovation in AI [52].