自监督学习
Search documents
模型「漂移」新范式,何恺明新作让生成模型无须迭代推理
机器之心· 2026-02-08 10:37
训练一个生成模型是很复杂的一件事儿。 从底层逻辑上来看,生成模型是一个逐步拟合的过程。与常见的判别类模型不同,判别类模型通常关注的是将 单个样本映射到对应标签 ,而生成模型则关注 从一 个分布映射到另一个分布。 从大家最熟悉的扩散模型说起,扩散模型,包括一些基于流的对应方法,通常通过微分方程(随机微分方程 SDE 或常微分方程 ODE)来刻画从噪声到数据的映 射。 但训练扩散模型是一件费时费力的事情,因为其核心计算过程是一个 迭代过程 。 为了尽可能提升生成模型的效率,大量工作致力于 减少扩散的步数 。比较有代表性的一类是蒸馏方法,将一个预训练的多步模型蒸馏为单步模型。另一类研究则 尝试从零开始训练单步扩散模型。例如: 变分自编码器(VAE)通过优化证据下界(ELBO)进行训练,该目标由重建损失和 KL 散度项组成。在采用高斯先验时,经典 VAE 本身就是一步生成模型。然 而,在当今主流应用中,VAE 往往使用由扩散模型或自回归模型学习得到的先验,此时 VAE 更多地充当分词器的角色。 正则化流(Normalizing Flows, NFs)学习从数据到噪声的映射,并通过最大化样本的对数似然进行训练。这类方法 ...
量化专题报告:“机器学习”选股模型系列研究(一):量价指纹模型的构建与应用初探
GOLDEN SUN SECURITIES· 2026-01-16 13:34
Quantitative Models and Construction Methods - **Model Name**: Volume-Price Fingerprint Model **Model Construction Idea**: Inspired by large language models, the volume-price fingerprint model treats market transaction data as a special "language" and uses self-supervised learning to extract semantic features from intraday volume-price behavior [1][8][9] **Model Construction Process**: 1. **Minute-level Feature Preprocessing**: Select 32 dimensions of minute-level features, including price features (e.g., high, low, close, price position) and transaction features (e.g., turnover, order cancellation, fund flow). Standardize these features to eliminate dimensional and historical volatility effects [1][16][17] - Price feature standardization formula: $$\tilde{p}_{t,d}=\frac{p_{t,d}}{p_{\mathrm{open}}}-1$$ [16] - Transaction feature standardization formula: $${\tilde{f}}_{t,d}={\frac{f_{t,d}}{S_{d}}},\quad S_{d}={\frac{1}{N_{\mathrm{hist}}}}\sum_{i=1}^{N_{\mathrm{hist}}}\sum_{t=1}^{T}f_{t,d}^{(i)}$$ [17] 2. **Dual-task Self-supervised Learning Framework**: - **Forward Causal Prediction Task**: Predict price features causally using past transaction and price information. A triangular attention mask ensures strict causality [18][21] - **Backward Feature Reconstruction Task**: Randomly mask transaction features and reconstruct them using global sequence information [18][22] 3. **Anti-collapse Regularization**: Introduce diversity, orthogonality, and uniformity regularization terms to ensure high differentiation, low redundancy, and rich information in fingerprint vectors [1][43][44][46] - Total loss function: $${\mathcal{L}}_{\mathrm{total}}=\lambda_{f}{\mathcal{L}}_{\mathrm{forward}}+\lambda_{b}{\mathcal{L}}_{\mathrm{backward}}+{\mathcal{L}}_{\mathrm{diversity}}+{\mathcal{L}}_{\mathrm{orthogonality}}+{\mathcal{L}}_{\mathrm{uniformity}}$$ [47] **Model Evaluation**: The model provides a structured representation of market dynamics, capturing semantic features beyond traditional numerical predictions [1][9][14] - **Model Name**: GRU Model with Volume-Price Fingerprint **Model Construction Idea**: Use GRU to predict future stock returns based on volume-price fingerprint features [2][50][51] **Model Construction Process**: 1. Input features include 128-dimensional volume-price fingerprint vectors and basic daily features (e.g., open, high, low, close, volume, turnover) [51][52] 2. GRU structure: Two-layer GRU + fully connected layers + LayerNorm + Relu + dropout + fully connected layers [53] 3. Training details: Batch size of 512, learning rate of 1e-4 with warmup, early stopping after 10 rounds without improvement [53][54] **Model Evaluation**: The GRU model effectively utilizes the semantic features of volume-price fingerprints for stock prediction, outperforming traditional factor-based models in certain metrics [2][50][54] - **Model Name**: Dual-stream GRU Model **Model Construction Idea**: Combine volume-price fingerprints and traditional volume-price factors using dual-stream GRU to leverage complementary information [67][68] **Model Construction Process**: 1. Separate GRU streams for volume-price fingerprints and traditional factors, followed by feature fusion through configurable weights [67][69] 2. Training details: Similar to single-stream GRU, with parallel training using different random seeds for robustness [68][69] **Model Evaluation**: The dual-stream GRU model improves prediction accuracy and stability, reducing overfitting risks associated with single data sources [68][69] Model Backtesting Results - **Volume-Price Fingerprint Model**: - Weekly RankIC Mean: 0.106 - Annualized Return (10-group long-short): 83.88% - IR: 5.41 - Weekly Win Rate: 73.87% - Max Drawdown: 11.65% [2][59][65] - **GRU Model with Volume-Price Fingerprint**: - Weekly RankIC Mean: 0.106 - Annualized Return (10-group long-short): 83.88% - IR: 5.41 - Weekly Win Rate: 73.87% - Max Drawdown: 11.65% [2][59][65] - **Dual-stream GRU Model**: - Weekly RankIC Mean: 0.109 - Annualized Return (10-group long-short): 90.89% - IR: 5.95 - Weekly Win Rate: 76.46% - Max Drawdown: 11.54% [68][74] Quantitative Factors and Construction Methods - **Factor Name**: Volume-Price Fingerprint Factor **Factor Construction Idea**: Extract semantic features from intraday volume-price behavior using self-supervised learning [1][9][14] **Factor Construction Process**: 1. Generate 128-dimensional daily fingerprint vectors using the volume-price fingerprint model [1][16][18] 2. Ensure high differentiation, low redundancy, and rich information through anti-collapse regularization [43][44][46] **Factor Evaluation**: The fingerprint factor captures hidden market patterns and semantic information, complementing traditional factors [1][9][14] Factor Backtesting Results - **Volume-Price Fingerprint Factor**: - Weekly RankIC Mean: 0.106 - Annualized Return (10-group long-short): 83.88% - IR: 5.41 - Weekly Win Rate: 73.87% - Max Drawdown: 11.65% [2][59][65] - **Fusion Factor (Volume-Price Fingerprint + Traditional Factors)**: - Weekly RankIC Mean: 0.109 - Annualized Return (10-group long-short): 90.89% - IR: 5.95 - Weekly Win Rate: 76.46% - Max Drawdown: 11.54% [68][74] Index Enhancement Results - **CSI 300 Enhanced Portfolio**: - Annualized Return: 11.00% - Excess Annualized Return: 7.12% - Tracking Error: 1.74% - IR: 4.10 - Monthly Win Rate: 86.11% - Max Drawdown: 1.85% [75][77] - **CSI 500 Enhanced Portfolio**: - Annualized Return: 13.32% - Excess Annualized Return: 11.38% - Tracking Error: 3.47% - IR: 3.28 - Monthly Win Rate: 83.33% - Max Drawdown: 4.76% [78][80] - **CSI 1000 Enhanced Portfolio**: - Annualized Return: 13.23% - Excess Annualized Return: 14.84% - Tracking Error: 3.45% - IR: 4.30 - Monthly Win Rate: 83.33% - Max Drawdown: 2.95% [82][83]
人脸机器人登上Science Robotics封面:用AI教会仿生人脸机器人「开口说话」
机器之心· 2026-01-15 04:31
胡宇航(网名 "U 航"),毕业于美国哥伦比亚大学,博士学位,首形科技创始人。长期专注于机器人自主学习的研究工作。研究成果发表于《Nature Machine Intelligence》,《Science Robotics》等国际顶级期刊。致力于赋予机器人 "自我模型" 能力,即构建对自身物理结构与运动的内部表征,使机器人能够更好地理解 自身,并适应多变的形态、环境与任务。在仿生人机交互方向,他提出融合语音、视觉与动作的情绪理解与表达一体化系统,为机器人提供更加自然的交互能 力。通过自监督学习机制,他的方法使机器人在无需人工干预的情况下不断提升人机互动质量,朝着具备终身学习能力的智能体不断迈进。 论文地址: https://www.science.org/doi/10.1126/scirobotics.adx3017 曾发表论文: 2026 年 1 月 15 日,一项来自美国哥伦比亚大学工程学院的突破性研究正式发表于《Science Robotics》,并登上期刊封面。该研究展示了一项全新的机器人技术: 一台具备仿生面部结构的人形机器人,通过深度学习实现与语音和歌曲同步的真实唇部运动。它能跟着人类的语言精准张 ...
医学影像诊断或将告别“手工标注时代”
Huan Qiu Wang Zi Xun· 2026-01-07 01:18
Core Insights - The article discusses the development of an AI model named AFLoc, which can autonomously identify lesions in medical images without prior annotation by doctors [1][3]. Group 1: AI Model Development - The AFLoc model learns from two types of information: medical images (such as chest X-rays, fundus photos, and pathological slices) and clinical reports written by doctors [3]. - Through repeated "contrastive learning," AFLoc can accurately identify the most likely lesion locations in images over time, even without manual annotations [3]. Group 2: Performance Validation - The research team conducted systematic validation of AFLoc on three typical medical imaging modalities: chest X-rays, fundus images, and tissue pathology images, showing excellent performance across all [3]. - In chest X-ray experiments, AFLoc outperformed existing methods in multiple lesion localization metrics across 34 common chest diseases and 8 mainstream public datasets, achieving results that meet or exceed human expert levels [3]. - AFLoc also demonstrated strong disease diagnosis capabilities, outperforming current methods in zero-shot classification tasks for chest X-rays, fundus, and tissue pathology images, particularly excelling in diagnosing retinal diseases [3]. Group 3: Implications for Clinical Use - The model effectively avoids the traditional deep learning reliance on large-scale manually annotated data, significantly enhancing the efficiency of medical image data utilization and the model's generalization ability [5]. - AFLoc provides a feasible path for transitioning clinical imaging AI from "manual annotation dependence" to "self-supervised learning," offering a new technical paradigm for building smarter and more versatile medical AI systems [5]. - The research team plans to further validate and apply AFLoc in real clinical settings, accelerating its transformation into a clinical decision support system [5].
自回归也能做强视觉模型?NEPA开启「下一嵌入预测」时代,谢赛宁参与
机器之心· 2026-01-02 05:00
Core Viewpoint - The article discusses a new approach in visual pre-training called Next-Embedding Predictive Autoregression (NEPA), which shifts the paradigm from learning representations to learning models, demonstrating strong performance in visual tasks similar to language models [2][18]. Group 1: NEPA Overview - NEPA is a minimalist approach that predicts the next feature block of an image, akin to how language models predict the next word [20]. - The method utilizes causal masking and stop gradient techniques to ensure stable predictions without requiring complex architectures [17][25]. - NEPA has shown competitive performance on benchmarks like ImageNet-1K, achieving Top-1 accuracy of 83.8% for ViT-B and 85.3% for ViT-L, surpassing several state-of-the-art methods [29]. Group 2: Methodology and Architecture - The architecture employs a standard visual Transformer (ViT) backbone with causal attention masking, directly predicting future image block embeddings based on past embeddings [22]. - Unlike pixel-level reconstruction methods, NEPA does not require a separate decoder, simplifying the model design [22]. - The training process involves segmenting images into patches, encoding them into vectors, and predicting the next patch while preventing the model from "cheating" by using stop-gradient techniques [25]. Group 3: Performance and Applications - NEPA demonstrates strong transfer capabilities, achieving 48.3% and 54.0% mIoU on the ADE20K semantic segmentation task, indicating its ability to learn rich semantic features necessary for dense prediction tasks [29]. - The model can be adapted for various downstream tasks by simply changing the classification head, showcasing its versatility [30]. - Visual analysis reveals that NEPA learns long-range, object-centered attention patterns, effectively ignoring background noise and focusing on semantically relevant areas [37].
LeCun在Meta的最后一篇论文
3 6 Ke· 2025-11-14 03:04
Core Insights - The article discusses Yann LeCun's recent paper on a self-supervised learning method called LeJEPA, which is seen as his farewell work at Meta as he departs the company [1][33]. - LeJEPA introduces a new framework that enhances predictive performance by ensuring the embedding space follows a specific statistical distribution [2]. Group 1: LeJEPA Framework - LeJEPA is based on isotropic Gaussian embedding and addresses the representation collapse issue in traditional JEPA frameworks, significantly improving model generalization [1][5]. - The framework utilizes Sketched Isotropic Gaussian Regularization (SIGReg) to achieve distribution matching, transforming the problem into a statistical hypothesis test [6][11]. Group 2: Experimental Validation - Extensive experiments were conducted on large architectures such as ViT, ConvNeXt, and ResNet, with models approaching 1 billion parameters [8]. - Results indicate that LeJEPA outperforms existing methods while maintaining training simplicity and robustness, particularly on domain-specific datasets like Galaxy10 and Food101 [10]. Group 3: Statistical Insights - The research highlights that isotropic Gaussian distribution minimizes bias and variance during training, enhancing stability and accuracy in downstream tasks [3][5]. - Non-isotropic distributions lead to higher bias and variance, confirming the superiority of isotropic Gaussian distribution through various experiments [3]. Group 4: Future Directions - Despite LeCun's departure from Meta, it is suggested that he is raising funds to establish a startup focused on advancing his work in world models, indicating ongoing contributions to the AI field [33][34].
LeCun在Meta的最后论文?还是共同一作,LeJEPA:JEPAs理论拼图补完
机器之心· 2025-11-14 01:33
Core Viewpoint - The article discusses the development of LeJEPA, a new self-supervised learning framework that addresses the limitations of existing Joint Embedding Predictive Architectures (JEPAs) by providing a solid theoretical foundation and eliminating reliance on heuristic methods [4][5][8]. Group 1: Theoretical Foundation - The research team established that the optimal embedding distribution for JEPAs is an isotropic Gaussian distribution, which minimizes downstream prediction risk across various tasks [5]. - A novel distribution matching objective called Stochastic Isotropic Gaussian Regularization (SIGReg) was introduced to efficiently enforce the embedding to conform to the ideal isotropic Gaussian distribution [6][8]. - LeJEPA combines the predictive objectives of JEPA with SIGReg, resulting in a statistically optimal solution that mitigates representation collapse [8][9]. Group 2: Practical Implementation - LeJEPA demonstrates simplicity, robustness, and high performance due to its principled theoretical design, which eliminates the need for complex heuristics like gradient stopping and teacher-student networks [9][11]. - The implementation of LeJEPA requires only about 50 lines of code in PyTorch, making it user-friendly and easy to deploy [11][19]. Group 3: Experimental Validation - LeJEPA was tested across over 10 datasets and 60 architectures, achieving or surpassing state-of-the-art results, such as a 79% accuracy on ImageNet-1K with ViT-H/14 [10]. - The framework showed superior performance in domain-specific datasets, outperforming DINOv2-based transfer learning, indicating its capability for in-domain pre-training [10][33]. Group 4: Stability and Scalability - LeJEPA maintains stability across different hyperparameters and architectures, with recommended settings yielding competitive performance even with small batch sizes [24][26]. - The framework's design is architecture-agnostic, allowing it to learn high-quality representations across various model types [26][27]. Group 5: Semantic Structure Emergence - LeJEPA's self-supervised learning successfully emerged semantic structures without explicit supervision, as evidenced by attention patterns that correspond to object boundaries and salient regions [41][43]. - The attention maps demonstrated temporal consistency, enabling unsupervised video segmentation, indicating that the learned features capture both spatial semantics and temporal structure [43].
LeCun在Meta的最后一篇论文
量子位· 2025-11-13 11:52
Core Insights - The article discusses the introduction of LeJEPA, a self-supervised learning method developed by Yann LeCun, marking his farewell from Meta [2][3][4]. - LeJEPA aims to address the representation collapse issue in traditional JEPA frameworks by utilizing isotropic Gaussian embeddings and introducing SIGReg regularization to enhance model generalization [5][6]. Group 1: LeJEPA Overview - LeJEPA is based on isotropic Gaussian embeddings, which effectively mitigate the representation collapse problem and significantly improve model generalization capabilities [5]. - The traditional JEPA framework often encounters representation collapse, where models map all inputs to a single point, hindering the capture of semantic differences [6]. Group 2: Impact of Embedding Distribution - The study analyzed the impact of embedding distribution on bias and variance through ordinary least squares regression, revealing that isotropic Gaussian distribution minimizes both during training [8][9]. - Isotropic Gaussian distribution ensures lower bias and variance compared to non-isotropic distributions, enhancing stability and accuracy in downstream tasks [9][11][13]. Group 3: SIGReg Regularization - SIGReg (Sketched Isotropic Gaussian Regularization) is introduced as a method to achieve distribution matching, transforming the problem into a hypothesis testing framework [15][17]. - It employs a combination of univariate directional tests and Epps-Pulley tests to assess the match between the embedding distribution and the target isotropic Gaussian distribution [16][17]. Group 4: High-Dimensional Challenges - SIGReg addresses computational challenges in high-dimensional spaces by combining SIGReg and predictive loss, ensuring efficient and stable training through mini-batch training [19][21]. - The total loss in LeJEPA is a weighted sum of SIGReg loss and predictive loss, with a hyperparameter λ to balance their contributions [22]. Group 5: Experimental Validation - Extensive experiments on large architectures, including ViT, ConvNeXt, ResNet, MaxViT, and Swin Transformer, demonstrated that LeJEPA outperforms existing methods while maintaining training simplicity and robustness [20][23]. - In domain-specific datasets like Galaxy10 and Food101, LeJEPA surpassed DINOv2-based transfer learning methods when pre-trained directly on target data [24]. Group 6: JEPA Framework Evolution - JEPA (Joint-Embedding Predictive Architecture) has evolved over three years since its introduction by LeCun, focusing on enhancing model expressiveness and reasoning capabilities through joint prediction methods [31][28]. - Unlike generative models, JEPA captures the dependencies between x and y without explicitly generating predictions for y [32]. Group 7: Future Directions - Although LeJEPA signifies the end of LeCun's research at Meta, it does not mark the conclusion of JEPA's development, as LeCun is reportedly raising funds to establish a startup focused on world models [72][71]. - LeCun's departure from Meta, while not entirely graceful, reflects a significant period of achievement in AI research, contributing to the field's advancement [74][79].
备受Meta折磨,LeCun依旧猛发论文,新作:JEPAs不只学特征,还能精准感知数据密度
3 6 Ke· 2025-10-09 11:39
Core Insights - Yann LeCun's team has discovered that the self-supervised model JEPAs (Joint Embedding Predictive Architecture) has the hidden ability to learn data density, which refers to the commonality of data samples [1][3] - This finding challenges the long-held belief that JEPAs only learn features and are unrelated to data density [3][4] Summary by Sections JEPAs Overview - JEPAs are a self-supervised learning framework that can autonomously learn feature patterns from vast amounts of data without manual labeling, making them efficient for tasks like image recognition and cross-modal matching [6][10] Key Findings - The breakthrough discovery is that JEPAs can accurately learn data density through a process called anti-collapse, which was previously thought to only prevent feature collapse [8][10] - The model's ability to perceive data density is a necessary outcome of its training process, as it must respond to small changes in samples to meet training constraints [8][10] Practical Application - The team introduced a key tool called JEPA-SCORE, which quantifies data density by scoring the commonality of samples. A higher score indicates a more typical sample, while a lower score suggests rarity or anomaly [10][11] - JEPA-SCORE is versatile and can be applied across various datasets and JEPAs architectures without additional training [10][11] Experimental Validation - Experiments demonstrated that JEPA-SCORE effectively identifies typical and rare samples in datasets like ImageNet and unfamiliar datasets, confirming its reliability and general applicability [11][13] Research Team - The research was a collaborative effort involving four core researchers from Meta's FAIR, including Randall Balestriero, Nicolas Ballas, and Michael Rabbat, each with significant backgrounds in AI and deep learning [20][22][23]
自动驾驶基础模型应该以能力为导向,而不仅是局限于方法本身
自动驾驶之心· 2025-09-16 23:33
Core Insights - The article discusses the transformative impact of foundational models on the autonomous driving perception domain, shifting from task-specific deep learning models to versatile architectures trained on vast and diverse datasets [2][4] - It introduces a new classification framework focusing on four core capabilities essential for robust performance in dynamic driving environments: general knowledge, spatial understanding, multi-sensor robustness, and temporal reasoning [2][5] Group 1: Introduction and Background - Autonomous driving perception is crucial for enabling vehicles to interpret their surroundings in real-time, involving key tasks such as object detection, semantic segmentation, and tracking [3] - Traditional models, designed for specific tasks, exhibit limited scalability and poor generalization, particularly in "long-tail scenarios" where rare but critical events occur [3][4] Group 2: Foundational Models - Foundational models, developed through self-supervised or unsupervised learning strategies, leverage large-scale datasets to learn general representations applicable across various downstream tasks [4][5] - These models demonstrate significant advantages in autonomous driving due to their inherent generalization capabilities, efficient transfer learning, and reduced reliance on labeled datasets [4][5] Group 3: Key Capabilities - The four key dimensions for designing foundational models tailored for autonomous driving perception are: 1. General Knowledge: Ability to adapt to a wide range of driving scenarios, including rare situations [5][6] 2. Spatial Understanding: Deep comprehension of 3D spatial structures and relationships [5][6] 3. Multi-Sensor Robustness: Maintaining high performance under varying environmental conditions and sensor failures [5][6] 4. Temporal Reasoning: Capturing temporal dependencies and predicting future states of the environment [6] Group 4: Integration and Challenges - The article outlines three mechanisms for integrating foundational models into autonomous driving technology stacks: feature-level distillation, pseudo-label supervision, and direct integration [37][40] - It highlights the challenges faced in deploying these models, including the need for effective domain adaptation, addressing hallucination risks, and ensuring efficiency in real-time applications [58][61] Group 5: Future Directions - The article emphasizes the importance of advancing research in foundational models to enhance their safety and effectiveness in autonomous driving systems, addressing current limitations and exploring new methodologies [2][5][58]