表征学习 - filings, earnings calls, financial reports, news

表征学习

Search documents

机器之心· 2025-09-24 07:48

机器之心发布机器之心编辑部 SAIS Talk 是上智院主办的前沿技术分享会，迄今已成功举行 15 期，讲者背景多元，既有曾深度参与诺贝尔奖评选的顶尖学者，也有活跃在科研一线的在读博士，以此激发灵感、共建生态。 9 月 26 日晚，五位来自共性技术、物质科学、生命科学、地球科学等方向的青年研究员将接连登场，分享核心工作和创新思辨，内容涵盖表征学习、催化反应预测、生物分子动态模拟、单细胞图谱、全球天气预测等多个领域。当 AI 与科学深度融合，当年轻力量碰撞前沿课题，我们或将在不远的未来见证 "AI 爱因斯坦" 的诞生。诚邀您与最具活力的 AI 力量同行，迎接科学发现的黄金时代的来临。（赴上海徐汇西岸的上智院参与活动请发送姓名、机构、手机号至 sais@sais.org.cn ，报名截止 9 月 25 日 18:00，报名确认会以邮件在 21:00 之前回复）活动议程每个分享环节含 5-10 分钟交流在全球人工智能浪潮奔涌向前的当下，创新的核心驱动力正越来越多地来自年轻一代。他们敢于挑战前沿、不惧失败，正以跨界融合之姿重新定义科学发现的范式。作为长期关注科学智能（AI for S ...

何恺明改进了谢赛宁的REPA：极大简化但性能依旧强悍

机器之心· 2025-06-12 09:57

Core Viewpoint - The article discusses the significance of representation learning in generative models, particularly through the introduction of a new method called Dispersive Loss, which integrates self-supervised learning into diffusion-based generative models without requiring additional pre-training or external data sources [6][9][43]. Group 1: Diffusion Models and Representation Learning - Diffusion models excel in modeling complex data distributions but are largely disconnected from the representation learning field [2]. - The training objectives of diffusion models typically focus on reconstruction tasks, such as denoising, lacking explicit regularization for learned representations [3]. - Representation learning, particularly self-supervised learning, is crucial for learning general representations applicable to various downstream tasks [4]. Group 2: Introduction of Dispersive Loss - Dispersive Loss is a flexible and general plug-in regularizer that integrates self-supervised learning into diffusion-based generative models [9]. - The core idea of Dispersive Loss is to introduce a regularization target for the model's internal representations, encouraging them to spread out in the latent space [10][13]. - This method does not require additional layers or parameters, making it a simple and independent approach [15][16]. Group 3: Comparison with Existing Methods - Dispersive Loss operates without the need for pre-training, external data, or additional model parameters, unlike the REPA method, which relies on pre-trained models [7][41][43]. - The new method demonstrates that representation learning can benefit generative modeling without external information sources [13][43]. - In practical applications, introducing Dispersive Loss requires minimal adjustments, such as specifying the intermediate layers for regularization [29]. Group 4: Performance Evaluation - Experimental results show that Dispersive Loss consistently outperforms corresponding contrastive losses while avoiding the complexities of dual-view sampling [33]. - The method has been tested across various models, including DiT and SiT, showing improvements in all scenarios, particularly in larger models where effective regularization is crucial [36][37]. - The article highlights that Dispersive Loss can be generalized for one-step diffusion-based generative models, indicating its versatility [44].

分散损失 (Dispersive Loss)

分散损失 (Dispersive Loss)

2025年中国多模态大模型行业核心技术现状关键在表征、翻译、对齐、融合、协同技术【组图】

Qian Zhan Wang· 2025-06-03 05:12

Core Insights - The article discusses the core technologies of multimodal large models, focusing on representation learning, translation, alignment, fusion, and collaborative learning [1][2][7][11][14]. Representation Learning - Representation learning is fundamental for multimodal tasks, addressing challenges such as combining heterogeneous data and handling varying noise levels across different modalities [1]. - Prior to the advent of Transformers, different modalities required distinct representation learning models, such as CNNs for computer vision (CV) and LSTMs for natural language processing (NLP) [1]. - The emergence of Transformers has enabled the unification of multiple modalities and cross-modal tasks, leading to a surge in multimodal pre-training models post-2019 [1]. Translation - Cross-modal translation aims to map source modalities to target modalities, such as generating descriptive sentences from images or vice versa [2]. - The use of syntactic templates allows for structured predictions, where specific words are filled in based on detected attributes [2]. - Encoder-decoder architectures are employed to encode source modality data into latent features, which are then decoded to generate the target modality [2]. Alignment - Alignment is crucial in multimodal learning, focusing on establishing correspondences between different data modalities to enhance understanding of complex scenarios [7]. - Explicit alignment involves categorizing instances with multiple components and measuring similarity, utilizing both unsupervised and supervised methods [7][8]. - Implicit alignment leverages latent representations for tasks without strict alignment, improving performance in applications like visual question answering (VQA) and machine translation [8]. Fusion - Fusion combines multimodal data or features for unified analysis and decision-making, enhancing task performance by integrating information from various modalities [11]. - Early fusion merges features at the feature level, while late fusion combines outputs at the decision level, with hybrid fusion incorporating both approaches [11][12]. - The choice of fusion method depends on the task and data, with neural networks becoming a popular approach for multimodal fusion [12]. Collaborative Learning - Collaborative learning utilizes data from one modality to enhance the model of another modality, categorized into parallel, non-parallel, and hybrid methods [14][15]. - Parallel learning requires direct associations between observations from different modalities, while non-parallel learning relies on overlapping categories [15]. - Hybrid methods connect modalities through shared datasets, allowing one modality to influence the training of another, applicable across various tasks [15].