学习

Search documents
从后训练回到预训练,LLM+RL 的潜力兑现有有机会走更远吗?
机器之心· 2025-06-28 05:22
Core Insights - The article discusses the potential of combining Reinforcement Learning (RL) with Large Language Models (LLMs), particularly focusing on the transition from post-training to pre-training phases, highlighting the challenges and opportunities in this area [2][3]. Group 1: Transition from Post-training to Pre-training - The integration of RL with LLMs is seen as a significant technological advancement, extending applications from post-training to pre-training phases [2]. - LLMs traditionally rely on supervised learning, which requires extensive and accurate human-provided data, making RL a viable alternative to address these limitations [3]. - RL's ability to generate data through model-environment interaction reduces the dependency on high-quality labeled data, thus lowering the requirements for supervision [3][4]. Group 2: Applications and Innovations in RL - Initial applications of RL in LLMs were focused on post-training, with techniques like Reinforcement Learning from Human Feedback (RLHF) being prominent [4]. - Recent advancements, such as Reinforcement Pre-Training (RPT) by researchers from Microsoft and Tsinghua University, have expanded RL's application to the pre-training phase, showing improved performance on certain benchmarks [4][5]. - RPT redefines the next token prediction (NTP) task as a verifiable reasoning task, potentially unlocking RL's capabilities while reducing reliance on labeled data [5]. Group 3: Challenges and Limitations - Despite the promising developments, the known limitations of RL in LLMs are still being uncovered, indicating that while the path appears bright, significant challenges remain [4][6]. - The training data and settings for RPT have yet to be validated across broader text and foundational models, and the computational resource demands for RL training continue to pose challenges [5].
OpenAI 4 名王牌研究员“叛变”,Meta 上亿美元的签约奖金终于花出去了
AI前线· 2025-06-28 05:13
整理 | 华卫 近日,据外媒报道,Meta 平台公司已招募四名前 OpenAI 研究人员加入其新成立的超级智能实验 室。 消息称,此次招聘对象包括 2022 年加入 ChatGPT 开发团队的特拉皮特·班萨尔(Trapit Bansal)。 据悉,他在启动 OpenAI 强化学习项目中发挥了关键作用。强化学习作为一种 AI 训练方法,适用于 构建推理模型。 另外三名已加入 Meta 的 OpenAI 研究人员分别是卢卡斯·拜尔(Lucas Beyer)、亚历山大·科列斯尼 科夫(Alexander Kolesnikov)和翟晓华(Xiaohua Zhai)。据了解,这三人曾于去年底协助建立 OpenAI 苏黎世办公室,此前他们在谷歌母公司 Alphabet 旗下的 DeepMind 机器学习实验室工作。 此次招聘发生在 Meta 首次披露组建超级智能研究团队的数周后。该实验室将负责开发能在广泛任务 中超越人类表现的 AI 模型。据悉,Meta 成立该部门的背景是其内部开发的大型语言模型 Llama 4 Behemoth 面临性能问题——该模型于今年早些时候预览,但因性能担忧已推迟发布。 上周,OpenAI 透 ...
量化指增迎超额盛宴!半鞅、蒙玺、龙旗、橡木、量盈等知名量化私募最新研判来袭!
私募排排网· 2025-06-28 02:37
今年来市场风格呈现出明显的大小盘分化,随着市场情绪的修复和市场活跃度的提升,大盘股表现相对较弱,而小盘股则受益于风险偏好提升、 流动性充沛等,表现尤为突出,量化策略的超额收益显著累积。 私募排排网数据显示,截至2025年5月底,有业绩显示的574只量化指增产品,近1年超额收益均值高达24.48%,其中正超额产品539只,正超额 占比高达93.91%。分三级策略来看,47只其他指增产品表现较为领先,近1年超额均值高达34.74%。(可参考: 最新量化多头超额榜揭晓!今 通、量创投资等领衔!进化论、龙旗、幻方等上榜! ) 本文首发于公众号"私募排排网"。 (点击↑↑ 上图查看详情 ) 半鞅私募基金 : 今年指数增强产品整体超额收益表现尤为突出,表观上来看,这得益于市场成交活跃度高、股票间的分化程度增加,这种市场 环境为量化管理人提供了丰富的交易机会,从而更容易获取超额收益。 从深层原因来看,则是在 固收类资产收益整体下行的背景下,权益市场因其相对较高的潜在回报和一定的"托底"效应,吸引了更多投资者的目 光,新增资金持续流入。 与此同时,特朗普上任后带来的市场不确定性增加,进一步激发了市场的波动性和交易活跃度。 最 ...
35岁前,趁早去做这7件事!
天天基金网· 2025-06-28 01:39
Core Viewpoint - Investing in health is essential for a prosperous future, emphasizing the importance of regular exercise, healthy eating, and annual health check-ups [1] Group 1: Health Investment - Engaging in a preferred sport three times a week and reducing processed food intake while increasing water consumption is recommended [1] - Annual health check-ups are crucial for early detection and adjustment of minor health issues [1] Group 2: Financial Management - Enhancing primary income through hard work and developing a habit of mandatory savings is advised [2] - A portion of income should be saved first, followed by learning basic financial management skills [2] Group 3: Continuous Learning - Investing in oneself by learning 1-2 new skills annually is highlighted as a highly beneficial investment [3] - This approach opens up more opportunities and reduces work-related stress, allowing for passive income growth [3] Group 4: Execution Strategies - Starting small and maintaining consistency is key; for instance, beginning with one workout per week or saving 5% of salary [5] - Finding enjoyment in activities, such as preferred sports or interesting skills, makes the process more engaging [5] Group 5: Regular Review - Monthly reviews of savings goals, exercise consistency, and family interactions are recommended [6] - Adjustments should be made if goals are not met, and small rewards should be given for progress [6] - Balancing effort with relaxation and enjoyment of life is essential for overall well-being [6] Group 6: Long-term Perspective - Achieving a fulfilling life at 35 is the result of daily mindful management, and taking action sooner leads to greater ease in life [7]
DeepSeek-R2为什么还没发?
猿大侠· 2025-06-27 14:57
Core Viewpoint - The release of DeepSeek-R2 has been delayed due to CEO Liang Wenfeng's dissatisfaction with its performance and a shortage of Nvidia H20 chips, which are critical for its development [1][2][4]. Group 1: Development Timeline - The anticipation for R2 began after the release of the DeepSeek-V3 model in December last year, which was considered a benchmark for cost-performance [5]. - Initial expectations suggested that R2 would be launched in April, following the upgrade of V3 on March 24 [11]. - Despite the release of a paper on inference scaling in April, there has been no official update on R2's launch [12][16]. Group 2: Technical Specifications - R1's training utilized 30,000 H20 chips, 10,000 H800 chips, and 10,000 H100 chips, indicating the significant computational resources required for R2 [3]. - Leaked parameters for R2 suggested it would have 1.2 trillion parameters and utilize 5.2 petabytes of training data, raising questions about its hardware requirements [17]. Group 3: Community Reactions - Following the news of the delay, community responses varied, with some expressing belief that the delay would be worthwhile, while others speculated that R2 might wait for the release of V4 [26][28].
各地多措并举推动数字赋能学习型社会建设
Xin Hua She· 2025-06-27 14:34
Group 1 - The event focused on building a digitally empowered learning society, with representatives sharing successful cases of digital education initiatives [1] - The National Open University aims to create a lifelong education network that connects urban and rural areas, serving as a foundation for a learning-oriented society [1] - The university's digital education platform has served 9.63 million learners in just six months, showcasing the effectiveness of digital technology in enhancing learning resources [1] Group 2 - Wuhan is leveraging its red education resources to create a "virtual three-dimensional digital memorial hall," expecting 7.26 million visitors in 2024 [2] - Changsha Civil Affairs Vocational College is digitizing practical teaching scenarios by developing virtual nursing homes and a digital museum of Chinese elderly culture, integrating 48 virtual simulation training systems [2] - Zhejiang Province is implementing a "five-level connection" framework to build the "Zhejiang Learning Pass" core platform, ensuring educational resources are accessible at the community level [2] Group 3 - The Ministry of Education plans to strengthen the national lifelong education platform to gather diverse learning resources and promote collaboration among school, family, and social education [2]
肖仰华教授:具身智能距离“涌现”还有多远?
3 6 Ke· 2025-06-27 11:30
Group 1 - The development of artificial intelligence (AI) has two clear trajectories: one represented by AIGC (Artificial Intelligence Generated Content) and the other by embodied intelligence [3][6] - AIGC is considered a technological revolution due to its foundational nature, its ability to significantly enhance productivity, and its profound impact on societal structures [10][11] - Embodied intelligence aims to replicate human sensory and action capabilities, but its impact on productivity is seen as limited compared to cognitive intelligence [11][13] Group 2 - The current stage of AI development emphasizes the quality of data and training strategies over sheer data volume and computational power [3][15] - The scaling law, which highlights the importance of large datasets and computational resources, is crucial for both AIGC and embodied intelligence [14][15] - The industry faces challenges in gathering sufficient high-quality data for embodied intelligence, which is currently lacking compared to language models [20][21] Group 3 - The future of embodied intelligence relies on its ability to understand and interact with human emotions, making emotional intelligence a core requirement for consumer applications [5][28] - The development of embodied AI is hindered by the complexity of accurately modeling human experiences and environmental interactions [30][32] - There is a need for innovative data acquisition strategies, such as combining real, synthetic, and simulated data, to overcome current limitations in embodied intelligence training [22][23]
Plumas Bancorp(PLBC) - 2024 Q4 - Earnings Call Presentation
2025-06-27 11:28
Nevada County INVESTOR PRESENTATION UPDATED THROUGH DECEMBER 31, 2024 1 Plumas County Index Forward Looking Statements Disclaimer The statements contained herein that are not historical facts are forward-looking statements based on management's current expectations and beliefs concerning future developments and their potential effects on the Company. Such statements involve inherent risks and uncertainties, many of which are difficult to predict and are generally beyond our control. There can be no assuranc ...
你的扫描全能王,作价217亿冲刺港股IPO
量子位· 2025-06-27 10:57
Core Viewpoint - The company, Shanghai Hehe Information Technology, is aiming to become the "first stock of intelligent text recognition" in Hong Kong, following its previous listing on the A-share Sci-Tech Innovation Board. The company has shown significant growth in revenue and user engagement, positioning itself as a leader in the AI sector with a focus on text intelligence technology [2][3][4]. Financial Performance - In 2024, the company reported a revenue of 1.438 billion RMB, a net profit of 400 million RMB, and a gross margin of 84.3% [4][25]. - The revenue growth from 2022 to 2024 was approximately 21% CAGR, with revenues of 989 million RMB, 1.187 billion RMB, and 1.438 billion RMB respectively [25]. - The C-end business accounted for a significant portion of total revenue, with contributions of 82.2%, 84.3%, and 83.8% from 2022 to 2024 [27]. User Engagement - The monthly active users (MAU) for C-end products reached 171 million in 2024, with a paid user ratio of 4.3% [21]. - The company ranks first in China and fifth globally among efficiency AI companies with MAU exceeding 100 million [21][22]. Product Portfolio - The company offers a range of products targeting both C-end and B-end markets, including "Scan All-in-One" and "Business Card All-in-One" for C-end, and "TextIn" and "Qixin Huayan" for B-end [8][12]. - The core technology is based on multi-modal text intelligence, which enhances efficiency in various applications [14][15]. Market Position - The company is positioned as a leading AI firm with a focus on text recognition and processing, competing with major players like OpenAI, Google, Adobe, and Microsoft [5][6][21]. - The global AI product market is projected to grow significantly, with estimates of 46.5 billion USD in 2024 and 228 billion USD by 2029, indicating a robust growth trajectory for the industry [66]. Research and Development - The company has been increasing its R&D investment, with expenditures of 280 million RMB, 323 million RMB, and 390 million RMB from 2022 to 2024, representing about 27% of total revenue [33]. - The workforce consists of 1,053 employees, with 60.6% in R&D roles, highlighting the company's commitment to innovation [35]. Future Plans - The funds raised from the Hong Kong listing will primarily be used for R&D, international expansion, and exploring investment and acquisition opportunities [50].
保姆级分享!ALOHA:低成本双臂机器人结合模仿学习经典工作
具身智能之心· 2025-06-27 08:36
Core Viewpoint - The article discusses the ALOHA system, a low-cost open-source hardware system designed for bimanual teleoperation, emphasizing its potential to perform precise manipulation tasks using affordable components and advanced learning algorithms [4][5][8]. Group 1: ALOHA System Overview - ALOHA is a low-cost system costing less than $20,000, designed to enable precise manipulation tasks using two low-cost robotic arms and 3D-printed components [7][8]. - The system utilizes end-to-end imitation learning to perform tasks by collecting real demonstrations from a custom remote operation interface [8][10]. Group 2: Challenges in Imitation Learning - Imitation learning faces challenges such as compounding errors, where small prediction errors accumulate, leading to significant deviations from expert behavior [9][12]. - The article highlights the difficulty of modeling complex physical interactions in tasks, suggesting that learning policies directly from demonstrations is more effective than modeling the entire environment [9][12]. Group 3: Action Chunking with Transformers (ACT) - The ACT algorithm addresses compounding errors by predicting sequences of actions rather than single steps, improving performance in tasks with high complexity [12][13]. - The algorithm has demonstrated an 80-90% success rate in tasks with only 10 minutes of demonstration data [12]. Group 4: Hardware Specifications - The ALOHA system is built on principles of low cost, versatility, user-friendliness, repairability, and ease of construction, utilizing ViperX 6-DoF robotic arms [17][18]. - The system is designed to perform various tasks, including precise, contact-based, and dynamic operations [20][22]. Group 5: Data Collection and Training - The system collects human demonstrations to train the policy, focusing on the leader robot's joint positions to capture the operator's intent and force feedback [23][25]. - The training process involves using a conditional variational autoencoder (CVAE) to model human data and improve learning from noisy demonstrations [33][55]. Group 6: Experimental Results - The article presents experimental results showing that action chunking and temporal ensembling significantly enhance the performance of the ACT algorithm [52][54]. - The necessity of high-frequency control is emphasized, with findings indicating that a control frequency of 50Hz allows for more precise and agile task execution [56].