机器之心
Search documents
1100多个模型殊途同归,指向一个「通用子空间」,柏拉图又赢一回?
机器之心· 2025-12-14 04:53
Core Insights - The importance of model architecture may exceed previous understanding, as a study from Johns Hopkins University reveals that over 1,100 different neural networks converge to a shared low-dimensional subspace, suggesting a "prior" mathematical structure that all neural networks approach [1][2][14]. Group 1: Findings and Implications - This discovery helps explain several phenomena, such as why over-parameterized models can generalize, why different initializations lead to similar representations, and the effectiveness of techniques like LoRA and weight sharing [2][14]. - The research provides empirical evidence for the existence of a universal weight subspace hypothesis, indicating that all models may converge to a common subspace, which could limit diversity and introduce inherent biases [8][14][33]. - The study suggests that shared subspaces could enable large-scale model compression, rapid adaptation to new tasks, and insights into generalization boundaries and optimization landscapes [14][15]. Group 2: Methodology and Results - The authors focused on LoRA adapters and observed the emergence of a universal subspace in the Mistral-7B model, extending the analysis to 500 Vision Transformers and 50 LLaMA3-8B models, all trained on different datasets and initializations [11][15]. - The analysis revealed that a unique shared low-rank structure exists across various tasks, with most information concentrated in 16 or fewer subspace directions, supporting the practical utility of the universal subspace [19][22]. - The universal subspace model demonstrated a 19-fold improvement in memory efficiency, as it eliminated the need to store all individual LoRA models [23]. Group 3: Theoretical Considerations - The authors propose several theoretical factors contributing to the emergence of universal subspaces, including neural networks' preference for low-frequency functions, strong inductive biases imposed by modern architectures, and the universal nature of gradient-based optimization methods [36][37].
谷歌创始人布林:当年发完Transformer论文,我们太不当回事了
机器之心· 2025-12-14 04:53
Core Insights - The article discusses the reflections of Sergey Brin, co-founder of Google, on the company's journey, its early decisions, and the future of education and research in the context of AI advancements [2][4][14]. Group 1: Google's Early Successes - Google had a grand mission statement from the beginning, aiming to "organize the world's information," which provided a strong foundation for the company [4]. - The company was founded with a strong academic background, emphasizing fundamental research and development, which differentiated it from many startups at the time [5]. - Brin highlighted the importance of being willing to tackle difficult problems, especially in the context of AI, where the required computational power and advanced mathematics have become increasingly valuable [6]. Group 2: AI Development and Missed Opportunities - Brin admitted that Google underestimated the significance of the Transformer paper released eight years ago, failing to invest adequately in scaling its computational resources [8]. - The company was hesitant to showcase its chatbot technology due to concerns about its performance, allowing competitors like OpenAI to capitalize on the opportunity [8]. - Despite past shortcomings, Google has a long history of investment in neural network research and has developed its own chips (TPUs) over the years, which positions it well in the AI landscape [10]. Group 3: Future of Education and Research - Brin suggested that the concept of universities may need to evolve, as geographical limitations become less relevant in an era of rapid information dissemination and online learning [14]. - He expressed uncertainty about the traditional path from academia to industry, noting that the timeline for ideas to reach commercial viability has shortened significantly [17]. - Brin emphasized the ongoing importance of academic research, particularly in foundational and exploratory areas, which may still be better suited for academic environments despite the industrial advancements in AI [19]. Group 4: Emerging Technologies and Opportunities - Brin identified materials science as a potentially underappreciated field with vast implications for both AI and quantum computing applications [27][28]. - He noted that while AI is currently a focal point, other areas such as synthetic biology and molecular sciences are also experiencing significant advancements that deserve attention [28].
8B模型任务击败GPT-5?阶跃星辰开源Deep Think新框架,小模型解锁百万Token测试时计算
机器之心· 2025-12-14 02:49
8B 模型在数学竞赛任务上超越 GPT-5! 阶跃星辰 正式推出并行协同推理(PaCoRe, Parallel Coordinated Reasoning),这是一个全新的训练和推理框架,让大模型的能力不再受限于线性思维链的上下文窗 口大小(Context Window)和处理速度,而是基于大规模并行协同的方式,让模型进行前所未有的广度和深度思考。 强大性能的 Gemini Deep Think 模式仅隐约透露其采用"并行思考"扩展测试时计算的思路;而 PaCoRe 以卓越的表现验证了大规模扩展测试时计算的有效性,并完 整开源模型,训练数据,推理管线从而加速该领域的研究与创新。 基于该框架, 小模型 亦能解锁百万级 Token 测试时计算 (Test-Time Compute)。 经过大规模、基于结果的强化学习(Outcome-based RL)训练,阶跃星辰研究团队的 PaCoRe-8B 模型掌握了综合发散性推理轨迹的能力。在 HMMT 2025 数学基 准测试中,它取得了 94.5 的高分,一举超越了 GPT-5 的 93.2 分。这一成绩的取得,得益于模型在解决单个问题时,能够有效利用高达两百万 Tok ...
干掉同传?谷歌把AI同传放入所有耳机,顺手发了个颠覆性的AI浏览器
机器之心· 2025-12-14 02:49
Core Insights - Google is accelerating the integration of its Gemini model capabilities into its core product line, particularly Google Translate, enhancing real-time voice translation and contextual understanding of text translations [2][5][8]. Group 1: Google Translate Enhancements - Google Translate has introduced a new Beta feature that allows users to listen to real-time translations through any brand of headphones, transforming them into a simultaneous translation tool [5][6]. - The new feature supports over 70 languages and is currently available on the Android version of the Translate app, with plans to expand to iOS and more countries by 2026 [7]. - The Gemini model improves text translation by better understanding idioms and local expressions, providing contextually accurate translations rather than literal ones [8]. Group 2: Language Learning Tools - Google is enhancing its translation app's language learning features to resemble professional language learning software, expanding to nearly 20 new countries/regions [9][11]. - New features include an improved feedback mechanism for speaking practice and a "Streak" function to encourage consistent learning habits [12]. Group 3: Experimental Browser - Disco - Google Labs has launched an experimental browser named "Disco," which aims to redefine web browsing through a feature called "GenTabs" [3][14]. - GenTabs dynamically generates interactive interfaces based on user input and related web content, providing a more integrated browsing experience [15][16]. - Disco is currently in an experimental phase with a waiting list for the macOS version [17].
「Memory as a Context」是否将重新定义 Transformer 的 「记忆模式」?
机器之心· 2025-12-14 01:30
Group 1 - The article discusses the concept of "Memory as a Context" and its potential to redefine the memory mechanisms of Transformers, addressing the limitations of current LLM memory capabilities [6][8]. - Google's Titans architecture introduces a neural long-term memory module that allows for online learning and optimization during testing, marking a shift from passive data storage to active learning [7][8]. - The Titans framework includes three architectural variants: "Memory as a Context," "Memory as a Gate," and "Memory as a Layer," each representing different approaches to integrating memory capabilities with Transformer models [7][8]. Group 2 - The article highlights the evolution of LLM memory mechanisms from static caches to adaptive test-time learning systems, enabling models to adjust memory strategies dynamically based on task requirements [9][10]. - A review of the past seven years of research on core memory operations—reading, writing, forgetting, and capacity management—reveals the limitations of static caching mechanisms and recent advancements in improving these operations [10]. - The research emphasizes the importance of selective writing, real-time decision-making, and adaptive resource allocation in enhancing the memory capabilities of Transformers [10].
ACL Fellows 2025名单公布:西湖大学张岳与UIUC季姮入选
机器之心· 2025-12-13 08:31
Core Viewpoint - The ACL has announced the list of 2025 ACL Fellows, recognizing significant contributions in the field of Natural Language Processing (NLP) [1]. Group 1: Overview of ACL Fellows - A total of 11 scholars have been selected as ACL Fellows in 2025, with notable inclusions of two Chinese scholars: Heng Ji from the University of Illinois Urbana-Champaign and Yue Zhang from Westlake University [1]. Group 2: Heng Ji's Contributions - Heng Ji is recognized for her important contributions in information extraction, multimodal and multilingual knowledge extraction, and "AI for Science" [6]. - She holds multiple positions at the University of Illinois, including Professor of Computer Science and Director of the Amazon-Illinois Interactive Dialogue Experience AI Center [7]. - Her research interests focus on NLP, particularly multimedia multilingual information extraction and knowledge-enhanced large language models [8]. Group 3: Yue Zhang's Contributions - Yue Zhang is acknowledged for his contributions to structured prediction and generalization in NLP, as well as his service to the NLP community and education [12]. - He has held various academic positions, including a tenure as an Associate Professor at Singapore University of Technology and Design [11]. - His research interests include NLP and underlying machine learning algorithms, with a focus on the differences between neural language models and human cognition [13]. Group 4: Other Notable Fellows - Rada Mihalcea is recognized for her contributions in NLP, multimodal processing, and computational social science, including the development of the TextRank algorithm [16]. - Mohit Bansal is acknowledged for his work in question-answering systems, scientific applications, and multimodal AI [20]. - Saif Mohammad is recognized for his pioneering contributions in knowledge-based NLP and commonsense reasoning [31]. - Lori Levin is acknowledged for her work in computational emotion science and responsible NLP [36]. - Alexander Koller is recognized for foundational contributions in computational semantics and neural-symbolic architectures [43].
NeurIPS 2025 | 告别全量扫描!浙大提出COIDO:破解多模态数据选择「高耗」难题
机器之心· 2025-12-13 08:31
Core Insights - The article introduces COIDO (Coupled Importance-Diversity Optimization), a framework designed to optimize data selection for visual instruction tuning in multi-modal large language models (MLLMs) [4][9][23] - COIDO aims to reduce the computational costs associated with data selection while ensuring high-quality data is retained, addressing the challenges of existing methods that often require full data traversal [12][23] Group 1: Motivation and Background - The rapid growth of datasets, such as LLaVA-665K, has led to significant computational overhead and redundancy when fine-tuning MLLMs on full datasets [8] - Existing data selection methods face two main issues: high selection costs and the decoupling of importance and diversity in data selection [12][9] Group 2: Methodology - COIDO introduces a lightweight scoring mechanism that allows for training on a small sample (e.g., 20%) of the full dataset, enabling generalization without the need for full data traversal [14] - The core innovation of COIDO is the coupled optimization of importance and diversity within a unified training framework, rather than treating them as separate phases [14] - The importance loss is based on a reweighted cross-entropy loss, while the diversity loss utilizes spectral clustering to minimize variance among clusters, ensuring a diverse data selection [14][15] Group 3: Experimental Results - COIDO achieves state-of-the-art performance using only 20% of the data, reaching 98.2% of the performance of full data fine-tuning across various benchmarks [20][21] - The framework demonstrates strong generalization and transferability, outperforming models trained from scratch on new datasets [21] Group 4: Conclusion - COIDO presents a novel paradigm for multi-modal data selection, challenging the notion that data selection must be costly and providing a pathway for efficient fine-tuning of MLLMs [23][24] - The framework's low computational cost and high-quality data selection make it a valuable tool for researchers with limited resources [23]
谢赛宁REPA得到大幅改进,只需不到4行代码
机器之心· 2025-12-13 04:59
Core Insights - The article discusses the importance of spatial structure over global semantic information in representation alignment for generative models, specifically in the context of diffusion models [1][3][42]. Group 1: Research Findings - A joint team from Adobe Research, Australian National University, and New York University conducted empirical analysis on 27 different visual encoders and model sizes [2]. - The unexpected result revealed that spatial structure, rather than global performance, drives the generative performance of target representations [3][8]. - The study introduced the concept of Spatial Self-Similarity to quantify spatial structure, which measures the clarity of "texture" and "relationships" in feature maps [15][17]. Group 2: iREPA Methodology - The team developed a simple method called iREPA, which can enhance the convergence speed of various visual encoders and training variants [5][20]. - iREPA's core modifications include replacing the MLP projection layer with a convolutional layer to better preserve local spatial relationships and introducing a spatial normalization layer to enhance spatial contrast [20][21][22]. Group 3: Performance Improvements - iREPA demonstrated significant improvements in convergence speed across various diffusion transformers and visual encoders, proving its robustness and general applicability [26][27]. - The method showed that as the model size increases, the performance gains from iREPA also increase, aligning with the "Scaling Law" trend [34]. - Visual quality improvements were evident, with iREPA-generated images exhibiting better object outlines, texture details, and overall structural coherence compared to standard REPA [36]. Group 4: Conclusion - The research emphasizes that understanding spatial relationships between pixels is more crucial for generative models than merely focusing on a single metric like ImageNet accuracy [42].
AAAI 2026 Oral | 拒绝「一刀切」!AdaMCoT:让大模型学会「看题下菜碟」,动态选择最佳思考语言
机器之心· 2025-12-13 04:59
多语言大模型(MLLM)在面对多语言任务时,往往面临一个选择难题:是用原来的语言直接回答,还是翻译成高资源语言去推理? 实际上, 不同 的语言在模型内部承载着不同的「特长」 。比如英语可能逻辑性强,适合科学推理;而中文或印尼语在处理特定文化背景或押韵任务时,可能比英 语更具优势。 如何让模型在面对不同任务时,自动选择一条「最顺手」的推理路径?来自新加坡科技研究局(A*STAR)Nancy F. Chen 和 Ai Ti Aw 带领的研究团队,携手新加 坡科技设计大学(SUTD)Roy Ka-Wei Lee 教授团队共同推出了 AdaMCoT(Adaptive Multilingual Chain-of-Thought)框架 。AdaMCoT 的核心在于 把 「用哪种 语言思考」本身当成一个可优化的决策变量 :通过自适应地在多种语言间路由并组合链式思考,再将推理结果映射回目标语言,从而显著提升跨语言的事实推理 准确性与一致性。 该工作已被 AAAI 2026 主轨道接收为 Oral 论文 。 研究背景与痛点 现有的跨语言推理方法通常存在「路径依赖」:要么不做处理直接推理,容易导致低资源语言的幻觉;要么强制全部转 ...
GPT-5.2已上线24小时:差评如潮!
机器之心· 2025-12-13 04:59
机器之心报道 编辑:杨文 网友吐槽GPT-5.2「不通人性」。 X 上充斥着对 GPT-5.2 的恶评。 昨天,OpenAI 十周年之际,拿出了 最新的顶级模型 GPT-5.2 系列 ,官方号称是「迄今为止在专业知识工作 上最强大的模型系列」,在众多基准测试中,GPT-5.2 也都刷新了最新的 SOTA 水平。 | | GPT-5.2 Thinking | GPT-5.1 Thinking | | --- | --- | --- | | GDPval (wins or ties) | 70.9% | 38.8% (GPT-5) | | Knowledge work tasks | | | | SWE-Bench Pro (public) | 55.6% | 50.8% | | Software engineering | | | | SWE-bench Verified | 80.0% | 76.3% | | Software engineering | | | | GPQA Diamond (no tools) | 92.4% | 88.1% | | Science questions | | | | Ch ...