Workflow
表征学习
icon
Search documents
田渊栋的2025年终总结:关于被裁和26年的研究方向
自动驾驶之心· 2026-01-06 00:28
Core Insights - The article discusses the complexities and challenges faced by the company in the context of project management and personal career decisions, particularly in the realm of AI and machine learning research [3][4][5]. Group 1: Project Management and Challenges - The company faced significant pressure when asked to assist with the Llama4 project, leading to a complex decision-making scenario that involved weighing potential outcomes and personal integrity [3]. - Despite the challenges, the company made progress in core areas of reinforcement learning, including training stability and model architecture design, which contributed to a shift in research perspectives [3]. Group 2: Career Decisions and Transitions - After over a decade with the company, there was contemplation about leaving, influenced by economic and personal factors, but ultimately a decision was made to stay, reflecting the difficulty of such transitions [4]. - The experience of navigating through ups and downs in the workplace provided valuable material for future creative endeavors, indicating a blend of professional and personal growth [5]. Group 3: Research Directions - The company is focusing on two main research directions for 2025: large model inference and understanding the "black box" of models, which has gained traction following the release of their continuous latent space reasoning work [6]. - Efforts to improve inference efficiency include various innovative approaches, such as using discrete tokens and parallel reasoning chains, which have shown promising results in reducing computational costs while enhancing performance [7]. Group 4: Interpretability and Future Directions - The company emphasizes the importance of interpretability in AI, arguing that understanding how AI systems work is crucial for ensuring ethical and effective use of technology [10]. - Current efforts to demystify model training processes are still in early stages, with a focus on deriving principles from first principles to guide future AI model design [11].
向量检索爆雷!傅聪联合浙大发布IceBerg Benchmark:HNSW并非最优,评估体系存在严重偏差
量子位· 2025-12-25 11:51
Core Insights - The integration of multimodal data into RAG and agent frameworks is a hot topic in the LLM application field, with vector retrieval being the most natural recall method for multimodal data [1] - There is a misconception that vector retrieval methods have been standardized, particularly the use of HNSW, which does not perform well in many downstream tasks [1] - A new benchmark called IceBerg has been introduced to evaluate vector retrieval algorithms based on downstream semantic tasks rather than traditional metrics like Recall-QPS, challenging past industry perceptions [1] Group 1: Misconceptions in Vector Retrieval - Many believe that vector retrieval methods are standardized, leading to a reliance on HNSW without considering its performance in real-world tasks [1] - The evaluation systems used in the past only scratch the surface of the complexities involved in vector retrieval [1] - A significant disparity exists between the perceived effectiveness of vector retrieval methods and their actual performance in downstream tasks [7] Group 2: Case Studies and Findings - In a large-scale facial verification dataset (Glink360K), the accuracy of facial recognition reached saturation before achieving a Recall of 99%, indicating a disconnect between distance metrics and actual task performance [5] - NSG, a state-of-the-art vector retrieval algorithm, shows absolute advantages in distance metric recall but underperforms in downstream semantic tasks compared to RaBitQ [5] - Different metric spaces can lead to vastly different outcomes in downstream tasks, highlighting the importance of metric selection in vector retrieval [6] Group 3: Information Loss and Model Limitations - An information loss funnel model is proposed to illustrate how information is lost at each stage of the embedding process, leading to discrepancies in expected outcomes [7] - The capacity of representation models directly affects the quality of embeddings, with generalization errors and learning objectives impacting performance [10][11] - Many models do not prioritize learning a good metric space, which can lead to significant information loss during the embedding process [13] Group 4: Metric and Algorithm Selection - The choice of metric (Euclidean vs. inner product) can have a substantial impact on results, especially when using generative representation models [15] - Different vector retrieval methods, categorized into space partitioning and graph-based indexing, perform differently based on data distribution [17] - The IceBerg benchmark reveals a reshuffling of vector retrieval algorithm rankings, demonstrating that HNSW is not always the top performer in downstream tasks [18] Group 5: Automation and Future Directions - IceBerg provides an automated algorithm selection tool that helps users choose the right method without extensive background knowledge [21] - Statistical indicators can reveal the affinity of embeddings to metrics and algorithms, facilitating automated decision-making [23] - The research team calls for future vector retrieval studies to focus on task-metric compatibility and the development of unified vector retrieval algorithms [25]
AI4S新势力齐聚「SAIS Talk上智院星辰之夜」:五大前沿分享,等你来听
机器之心· 2025-09-24 07:48
Core Insights - The article emphasizes the role of the younger generation in driving innovation in the field of artificial intelligence, particularly in scientific research [2] - The Shanghai Institute of Scientific Intelligence (上智院) is highlighted as the world's first research institute focused on AI for Science, aiming to transform scientific research paradigms and empower various industries [2] - The SAIS Talk event showcases promising young researchers sharing their innovative work in scientific intelligence, indicating a vibrant future for AI in scientific discovery [3] Group 1: Event Overview - The SAIS Talk has successfully held 15 sessions, featuring speakers from diverse backgrounds, including top scholars and active researchers, to foster inspiration and collaboration [3] - The event on September 26 will feature five young researchers discussing topics such as representation learning, catalytic reaction prediction, and global weather forecasting [3] Group 2: Research Highlights - Research on hierarchical spatiotemporal representation and cross-scale implicit autoregressive modeling significantly improves long-term prediction accuracy in dynamic systems [5] - The RXNGraphormer framework unifies the prediction of chemical reaction performance and synthesis planning, achieving leading performance across multiple prediction tasks [10] - A 4D diffusion model framework for protein dynamics and conformational generation offers new computational paradigms for understanding protein functions and accelerating drug design [13] - The SCRIPT framework for single-cell gene regulatory relationship prediction shows over twofold improvement in long-range regulatory predictions, with implications for complex disease genetic diagnostics [17] - FuXi-Weather, a machine learning-based global weather forecasting system, demonstrates superior performance in sparse observation areas compared to traditional numerical weather prediction systems [21]
何恺明改进了谢赛宁的REPA:极大简化但性能依旧强悍
机器之心· 2025-06-12 09:57
Core Viewpoint - The article discusses the significance of representation learning in generative models, particularly through the introduction of a new method called Dispersive Loss, which integrates self-supervised learning into diffusion-based generative models without requiring additional pre-training or external data sources [6][9][43]. Group 1: Diffusion Models and Representation Learning - Diffusion models excel in modeling complex data distributions but are largely disconnected from the representation learning field [2]. - The training objectives of diffusion models typically focus on reconstruction tasks, such as denoising, lacking explicit regularization for learned representations [3]. - Representation learning, particularly self-supervised learning, is crucial for learning general representations applicable to various downstream tasks [4]. Group 2: Introduction of Dispersive Loss - Dispersive Loss is a flexible and general plug-in regularizer that integrates self-supervised learning into diffusion-based generative models [9]. - The core idea of Dispersive Loss is to introduce a regularization target for the model's internal representations, encouraging them to spread out in the latent space [10][13]. - This method does not require additional layers or parameters, making it a simple and independent approach [15][16]. Group 3: Comparison with Existing Methods - Dispersive Loss operates without the need for pre-training, external data, or additional model parameters, unlike the REPA method, which relies on pre-trained models [7][41][43]. - The new method demonstrates that representation learning can benefit generative modeling without external information sources [13][43]. - In practical applications, introducing Dispersive Loss requires minimal adjustments, such as specifying the intermediate layers for regularization [29]. Group 4: Performance Evaluation - Experimental results show that Dispersive Loss consistently outperforms corresponding contrastive losses while avoiding the complexities of dual-view sampling [33]. - The method has been tested across various models, including DiT and SiT, showing improvements in all scenarios, particularly in larger models where effective regularization is crucial [36][37]. - The article highlights that Dispersive Loss can be generalized for one-step diffusion-based generative models, indicating its versatility [44].
2025年中国多模态大模型行业核心技术现状 关键在表征、翻译、对齐、融合、协同技术【组图】
Qian Zhan Wang· 2025-06-03 05:12
Core Insights - The article discusses the core technologies of multimodal large models, focusing on representation learning, translation, alignment, fusion, and collaborative learning [1][2][7][11][14]. Representation Learning - Representation learning is fundamental for multimodal tasks, addressing challenges such as combining heterogeneous data and handling varying noise levels across different modalities [1]. - Prior to the advent of Transformers, different modalities required distinct representation learning models, such as CNNs for computer vision (CV) and LSTMs for natural language processing (NLP) [1]. - The emergence of Transformers has enabled the unification of multiple modalities and cross-modal tasks, leading to a surge in multimodal pre-training models post-2019 [1]. Translation - Cross-modal translation aims to map source modalities to target modalities, such as generating descriptive sentences from images or vice versa [2]. - The use of syntactic templates allows for structured predictions, where specific words are filled in based on detected attributes [2]. - Encoder-decoder architectures are employed to encode source modality data into latent features, which are then decoded to generate the target modality [2]. Alignment - Alignment is crucial in multimodal learning, focusing on establishing correspondences between different data modalities to enhance understanding of complex scenarios [7]. - Explicit alignment involves categorizing instances with multiple components and measuring similarity, utilizing both unsupervised and supervised methods [7][8]. - Implicit alignment leverages latent representations for tasks without strict alignment, improving performance in applications like visual question answering (VQA) and machine translation [8]. Fusion - Fusion combines multimodal data or features for unified analysis and decision-making, enhancing task performance by integrating information from various modalities [11]. - Early fusion merges features at the feature level, while late fusion combines outputs at the decision level, with hybrid fusion incorporating both approaches [11][12]. - The choice of fusion method depends on the task and data, with neural networks becoming a popular approach for multimodal fusion [12]. Collaborative Learning - Collaborative learning utilizes data from one modality to enhance the model of another modality, categorized into parallel, non-parallel, and hybrid methods [14][15]. - Parallel learning requires direct associations between observations from different modalities, while non-parallel learning relies on overlapping categories [15]. - Hybrid methods connect modalities through shared datasets, allowing one modality to influence the training of another, applicable across various tasks [15].