自监督学习
Search documents
量化专题报告:“机器学习”选股模型系列研究(一):量价指纹模型的构建与应用初探
GOLDEN SUN SECURITIES· 2026-01-16 13:34
证券研究报告 | 金融工程 gszqdatemark 2026 01 16 年 月 日 量化专题报告 "机器学习"选股模型系列研究(一) 量价指纹模型的构建与应用初探 (1)分钟特征预处理:选取 32 维分钟级特征,包括价格特征"(如高、低、 收、价格位置等)与交易特征"(如成交额、挂撤单、资金流等),并分别进 行标准化处理,以消除量纲与历史波动的影响; (2)双任务学习框架:通过前向因果预测任务"(价格特征预测)与后向特 征重建任务(交易特征重建),迫使模型学习市场量价关系的动态语义与 因果结构,生成 128 维日度指纹向量; (3)防坍缩设计:引入多样性、正交性与均匀性正则项,确保指纹向量具 备高区分度、低冗余与信息丰富的特性,避免表征坍塌。 量价指纹模型的端到端应用初探:我们将""量价指纹"作为输入特征,结 合 GRU 模型,对股票的未来收益进行预测。测试结果表明: (1)仅使用量价指纹进行模型训练得到的因子,具备一定的预测能力, 2017/01/01-2025/12/31,因子的周度 RankIC 均值为 0.106,全市场 10 分组多空对冲的年化收益为 83.88%,信息比率为 5.41,最大回撤 ...
人脸机器人登上Science Robotics封面:用AI教会仿生人脸机器人「开口说话」
机器之心· 2026-01-15 04:31
胡宇航(网名 "U 航"),毕业于美国哥伦比亚大学,博士学位,首形科技创始人。长期专注于机器人自主学习的研究工作。研究成果发表于《Nature Machine Intelligence》,《Science Robotics》等国际顶级期刊。致力于赋予机器人 "自我模型" 能力,即构建对自身物理结构与运动的内部表征,使机器人能够更好地理解 自身,并适应多变的形态、环境与任务。在仿生人机交互方向,他提出融合语音、视觉与动作的情绪理解与表达一体化系统,为机器人提供更加自然的交互能 力。通过自监督学习机制,他的方法使机器人在无需人工干预的情况下不断提升人机互动质量,朝着具备终身学习能力的智能体不断迈进。 论文地址: https://www.science.org/doi/10.1126/scirobotics.adx3017 曾发表论文: 2026 年 1 月 15 日,一项来自美国哥伦比亚大学工程学院的突破性研究正式发表于《Science Robotics》,并登上期刊封面。该研究展示了一项全新的机器人技术: 一台具备仿生面部结构的人形机器人,通过深度学习实现与语音和歌曲同步的真实唇部运动。它能跟着人类的语言精准张 ...
医学影像诊断或将告别“手工标注时代”
Huan Qiu Wang Zi Xun· 2026-01-07 01:18
在胸片实验中,AFLoc在覆盖肺炎、胸腔积液、气胸等34种常见胸部疾病、涉及8个主流公开数据集的 测试中,在多项病灶定位指标上优于现有方法,并在多个病种中达到了甚至超越人类专家的水平。在眼 底影像和病理图像任务中,AFLoc同样展现出稳定的病灶定位能力,定位精度优于当前主流模型。 除病灶定位外,AFLoc还展现出强大的疾病诊断能力。在胸部X光、眼底和组织病理图像的零样本分类 任务中,其整体表现均优于现有方法。尤其在眼底视网膜病变诊断中,AFLoc的零样本分类性能甚至超 越了部分依赖人工标注数据微调的模型。 1月6日,中国科学院深圳先进技术研究院医学成像科学与技术系统全国重点实验室研究员王珊珊团队联 合清华大学助理教授周洪宇、澳门科技大学教授张康等合作者在《自然·生物医学工程》发表最新成 果,研究团队提出了一种名为AFLoc的人工智能模型,该模型不需要医生提前标注病灶,就能自动在医 学影像中"找病灶"。 研究团队介绍,"我们让AFLoc 模型同时学习两类信息,一类是医学影像本身,比如胸片、眼底照片或 病理切片,另一类是医生撰写的临床报告。通过反复'对照学习',AFLoc模型会逐渐明白,临床报告中 提到的疾病描述 ...
自回归也能做强视觉模型?NEPA开启「下一嵌入预测」时代,谢赛宁参与
机器之心· 2026-01-02 05:00
Core Viewpoint - The article discusses a new approach in visual pre-training called Next-Embedding Predictive Autoregression (NEPA), which shifts the paradigm from learning representations to learning models, demonstrating strong performance in visual tasks similar to language models [2][18]. Group 1: NEPA Overview - NEPA is a minimalist approach that predicts the next feature block of an image, akin to how language models predict the next word [20]. - The method utilizes causal masking and stop gradient techniques to ensure stable predictions without requiring complex architectures [17][25]. - NEPA has shown competitive performance on benchmarks like ImageNet-1K, achieving Top-1 accuracy of 83.8% for ViT-B and 85.3% for ViT-L, surpassing several state-of-the-art methods [29]. Group 2: Methodology and Architecture - The architecture employs a standard visual Transformer (ViT) backbone with causal attention masking, directly predicting future image block embeddings based on past embeddings [22]. - Unlike pixel-level reconstruction methods, NEPA does not require a separate decoder, simplifying the model design [22]. - The training process involves segmenting images into patches, encoding them into vectors, and predicting the next patch while preventing the model from "cheating" by using stop-gradient techniques [25]. Group 3: Performance and Applications - NEPA demonstrates strong transfer capabilities, achieving 48.3% and 54.0% mIoU on the ADE20K semantic segmentation task, indicating its ability to learn rich semantic features necessary for dense prediction tasks [29]. - The model can be adapted for various downstream tasks by simply changing the classification head, showcasing its versatility [30]. - Visual analysis reveals that NEPA learns long-range, object-centered attention patterns, effectively ignoring background noise and focusing on semantically relevant areas [37].
LeCun在Meta的最后一篇论文
3 6 Ke· 2025-11-14 03:04
Core Insights - The article discusses Yann LeCun's recent paper on a self-supervised learning method called LeJEPA, which is seen as his farewell work at Meta as he departs the company [1][33]. - LeJEPA introduces a new framework that enhances predictive performance by ensuring the embedding space follows a specific statistical distribution [2]. Group 1: LeJEPA Framework - LeJEPA is based on isotropic Gaussian embedding and addresses the representation collapse issue in traditional JEPA frameworks, significantly improving model generalization [1][5]. - The framework utilizes Sketched Isotropic Gaussian Regularization (SIGReg) to achieve distribution matching, transforming the problem into a statistical hypothesis test [6][11]. Group 2: Experimental Validation - Extensive experiments were conducted on large architectures such as ViT, ConvNeXt, and ResNet, with models approaching 1 billion parameters [8]. - Results indicate that LeJEPA outperforms existing methods while maintaining training simplicity and robustness, particularly on domain-specific datasets like Galaxy10 and Food101 [10]. Group 3: Statistical Insights - The research highlights that isotropic Gaussian distribution minimizes bias and variance during training, enhancing stability and accuracy in downstream tasks [3][5]. - Non-isotropic distributions lead to higher bias and variance, confirming the superiority of isotropic Gaussian distribution through various experiments [3]. Group 4: Future Directions - Despite LeCun's departure from Meta, it is suggested that he is raising funds to establish a startup focused on advancing his work in world models, indicating ongoing contributions to the AI field [33][34].
LeCun在Meta的最后论文?还是共同一作,LeJEPA:JEPAs理论拼图补完
机器之心· 2025-11-14 01:33
Core Viewpoint - The article discusses the development of LeJEPA, a new self-supervised learning framework that addresses the limitations of existing Joint Embedding Predictive Architectures (JEPAs) by providing a solid theoretical foundation and eliminating reliance on heuristic methods [4][5][8]. Group 1: Theoretical Foundation - The research team established that the optimal embedding distribution for JEPAs is an isotropic Gaussian distribution, which minimizes downstream prediction risk across various tasks [5]. - A novel distribution matching objective called Stochastic Isotropic Gaussian Regularization (SIGReg) was introduced to efficiently enforce the embedding to conform to the ideal isotropic Gaussian distribution [6][8]. - LeJEPA combines the predictive objectives of JEPA with SIGReg, resulting in a statistically optimal solution that mitigates representation collapse [8][9]. Group 2: Practical Implementation - LeJEPA demonstrates simplicity, robustness, and high performance due to its principled theoretical design, which eliminates the need for complex heuristics like gradient stopping and teacher-student networks [9][11]. - The implementation of LeJEPA requires only about 50 lines of code in PyTorch, making it user-friendly and easy to deploy [11][19]. Group 3: Experimental Validation - LeJEPA was tested across over 10 datasets and 60 architectures, achieving or surpassing state-of-the-art results, such as a 79% accuracy on ImageNet-1K with ViT-H/14 [10]. - The framework showed superior performance in domain-specific datasets, outperforming DINOv2-based transfer learning, indicating its capability for in-domain pre-training [10][33]. Group 4: Stability and Scalability - LeJEPA maintains stability across different hyperparameters and architectures, with recommended settings yielding competitive performance even with small batch sizes [24][26]. - The framework's design is architecture-agnostic, allowing it to learn high-quality representations across various model types [26][27]. Group 5: Semantic Structure Emergence - LeJEPA's self-supervised learning successfully emerged semantic structures without explicit supervision, as evidenced by attention patterns that correspond to object boundaries and salient regions [41][43]. - The attention maps demonstrated temporal consistency, enabling unsupervised video segmentation, indicating that the learned features capture both spatial semantics and temporal structure [43].
LeCun在Meta的最后一篇论文
量子位· 2025-11-13 11:52
Core Insights - The article discusses the introduction of LeJEPA, a self-supervised learning method developed by Yann LeCun, marking his farewell from Meta [2][3][4]. - LeJEPA aims to address the representation collapse issue in traditional JEPA frameworks by utilizing isotropic Gaussian embeddings and introducing SIGReg regularization to enhance model generalization [5][6]. Group 1: LeJEPA Overview - LeJEPA is based on isotropic Gaussian embeddings, which effectively mitigate the representation collapse problem and significantly improve model generalization capabilities [5]. - The traditional JEPA framework often encounters representation collapse, where models map all inputs to a single point, hindering the capture of semantic differences [6]. Group 2: Impact of Embedding Distribution - The study analyzed the impact of embedding distribution on bias and variance through ordinary least squares regression, revealing that isotropic Gaussian distribution minimizes both during training [8][9]. - Isotropic Gaussian distribution ensures lower bias and variance compared to non-isotropic distributions, enhancing stability and accuracy in downstream tasks [9][11][13]. Group 3: SIGReg Regularization - SIGReg (Sketched Isotropic Gaussian Regularization) is introduced as a method to achieve distribution matching, transforming the problem into a hypothesis testing framework [15][17]. - It employs a combination of univariate directional tests and Epps-Pulley tests to assess the match between the embedding distribution and the target isotropic Gaussian distribution [16][17]. Group 4: High-Dimensional Challenges - SIGReg addresses computational challenges in high-dimensional spaces by combining SIGReg and predictive loss, ensuring efficient and stable training through mini-batch training [19][21]. - The total loss in LeJEPA is a weighted sum of SIGReg loss and predictive loss, with a hyperparameter λ to balance their contributions [22]. Group 5: Experimental Validation - Extensive experiments on large architectures, including ViT, ConvNeXt, ResNet, MaxViT, and Swin Transformer, demonstrated that LeJEPA outperforms existing methods while maintaining training simplicity and robustness [20][23]. - In domain-specific datasets like Galaxy10 and Food101, LeJEPA surpassed DINOv2-based transfer learning methods when pre-trained directly on target data [24]. Group 6: JEPA Framework Evolution - JEPA (Joint-Embedding Predictive Architecture) has evolved over three years since its introduction by LeCun, focusing on enhancing model expressiveness and reasoning capabilities through joint prediction methods [31][28]. - Unlike generative models, JEPA captures the dependencies between x and y without explicitly generating predictions for y [32]. Group 7: Future Directions - Although LeJEPA signifies the end of LeCun's research at Meta, it does not mark the conclusion of JEPA's development, as LeCun is reportedly raising funds to establish a startup focused on world models [72][71]. - LeCun's departure from Meta, while not entirely graceful, reflects a significant period of achievement in AI research, contributing to the field's advancement [74][79].
备受Meta折磨,LeCun依旧猛发论文,新作:JEPAs不只学特征,还能精准感知数据密度
3 6 Ke· 2025-10-09 11:39
Core Insights - Yann LeCun's team has discovered that the self-supervised model JEPAs (Joint Embedding Predictive Architecture) has the hidden ability to learn data density, which refers to the commonality of data samples [1][3] - This finding challenges the long-held belief that JEPAs only learn features and are unrelated to data density [3][4] Summary by Sections JEPAs Overview - JEPAs are a self-supervised learning framework that can autonomously learn feature patterns from vast amounts of data without manual labeling, making them efficient for tasks like image recognition and cross-modal matching [6][10] Key Findings - The breakthrough discovery is that JEPAs can accurately learn data density through a process called anti-collapse, which was previously thought to only prevent feature collapse [8][10] - The model's ability to perceive data density is a necessary outcome of its training process, as it must respond to small changes in samples to meet training constraints [8][10] Practical Application - The team introduced a key tool called JEPA-SCORE, which quantifies data density by scoring the commonality of samples. A higher score indicates a more typical sample, while a lower score suggests rarity or anomaly [10][11] - JEPA-SCORE is versatile and can be applied across various datasets and JEPAs architectures without additional training [10][11] Experimental Validation - Experiments demonstrated that JEPA-SCORE effectively identifies typical and rare samples in datasets like ImageNet and unfamiliar datasets, confirming its reliability and general applicability [11][13] Research Team - The research was a collaborative effort involving four core researchers from Meta's FAIR, including Randall Balestriero, Nicolas Ballas, and Michael Rabbat, each with significant backgrounds in AI and deep learning [20][22][23]
自动驾驶基础模型应该以能力为导向,而不仅是局限于方法本身
自动驾驶之心· 2025-09-16 23:33
Core Insights - The article discusses the transformative impact of foundational models on the autonomous driving perception domain, shifting from task-specific deep learning models to versatile architectures trained on vast and diverse datasets [2][4] - It introduces a new classification framework focusing on four core capabilities essential for robust performance in dynamic driving environments: general knowledge, spatial understanding, multi-sensor robustness, and temporal reasoning [2][5] Group 1: Introduction and Background - Autonomous driving perception is crucial for enabling vehicles to interpret their surroundings in real-time, involving key tasks such as object detection, semantic segmentation, and tracking [3] - Traditional models, designed for specific tasks, exhibit limited scalability and poor generalization, particularly in "long-tail scenarios" where rare but critical events occur [3][4] Group 2: Foundational Models - Foundational models, developed through self-supervised or unsupervised learning strategies, leverage large-scale datasets to learn general representations applicable across various downstream tasks [4][5] - These models demonstrate significant advantages in autonomous driving due to their inherent generalization capabilities, efficient transfer learning, and reduced reliance on labeled datasets [4][5] Group 3: Key Capabilities - The four key dimensions for designing foundational models tailored for autonomous driving perception are: 1. General Knowledge: Ability to adapt to a wide range of driving scenarios, including rare situations [5][6] 2. Spatial Understanding: Deep comprehension of 3D spatial structures and relationships [5][6] 3. Multi-Sensor Robustness: Maintaining high performance under varying environmental conditions and sensor failures [5][6] 4. Temporal Reasoning: Capturing temporal dependencies and predicting future states of the environment [6] Group 4: Integration and Challenges - The article outlines three mechanisms for integrating foundational models into autonomous driving technology stacks: feature-level distillation, pseudo-label supervision, and direct integration [37][40] - It highlights the challenges faced in deploying these models, including the need for effective domain adaptation, addressing hallucination risks, and ensuring efficiency in real-time applications [58][61] Group 5: Future Directions - The article emphasizes the importance of advancing research in foundational models to enhance their safety and effectiveness in autonomous driving systems, addressing current limitations and exploring new methodologies [2][5][58]
SceneSplat: 基于3DGS的场景理解和视觉语言预训练,让3D高斯「听懂人话」的一跃
机器之心· 2025-09-07 08:21
Core Insights - The article introduces SceneSplat, the first end-to-end large-scale 3D indoor scene understanding method that operates natively on 3D Gaussian Scenes (3DGS) [2][6] - A self-supervised learning scheme is proposed to unlock rich 3D feature learning from unlabelled scenes, addressing the lack of models that can independently handle 3D data for semantic learning [2][6] - The SceneSplat-7K dataset is created, consisting of 7,916 scenes sourced from seven existing datasets, enabling effective training and testing of the SceneSplat model [2][6] Dataset Construction - SceneSplat-7K includes 7,916 processed 3DGS scenes and a total of 11.27 billion Gaussian points, with an average of approximately 1.42 million points per scene [6][7] - The dataset's construction required computational resources equivalent to 150 days of running on L4 GPUs, ensuring high reconstruction quality with a PSNR of 29.64 dB and average Depth-L1 of 0.035 m [6][7] Semantic Annotation - A stable and fast system is utilized for annotating semantic information in 3DGS, employing SAMv2 for object-level segmentation and SigLIP2 for extracting visual-language features [8][10] - The pre-trained encoder learns rich semantic representations solely based on 3DGS parameters and neighborhood information, eliminating the need for 2D fusion during inference [8][10] Training Methodology - Two training routes are provided: visual-language pre-training for labelled data and self-supervised training for unlabelled data, maximizing the learning potential of unlabelled scenes [12][14] - The model employs a hierarchical Transformer architecture, using Gaussian tokens and neighborhood attention to achieve effective semantic vector regression [15] Experimental Results - The SceneSplat method achieves state-of-the-art (SOTA) results in zero-shot semantic segmentation on datasets like ScanNet200, ScanNet++, and Matterport3D [21][22] - Quantitative experiments demonstrate significant improvements in mean Intersection over Union (mIoU) and mean Accuracy (mAcc) across various datasets, showcasing the model's robustness [22][23] Future Work - The SceneSplat-7K dataset is being expanded to SceneSplat-49K, with ongoing benchmarking of 3DGS and semantic integration across multiple datasets [31]