Workflow
强化学习
icon
Search documents
Embedding黑箱成为历史!这个新框架让模型“先解释,再学Embedding”
量子位· 2025-10-21 09:05
Core Insights - The article introduces GRACE, a new explainable generative embedding framework developed by researchers from multiple universities, aimed at addressing the limitations of traditional text embedding models [1][6]. Group 1: Background and Limitations - Text embedding models have evolved from BERT to various newer models, mapping text into vector spaces for tasks like semantic retrieval and clustering [3]. - A common flaw in these models is treating large language models as "mute encoders," which output vectors without explaining the similarity between texts [4]. - This black-box representation becomes a bottleneck in tasks requiring high interpretability and robustness, such as question-answer matching and cross-domain retrieval [5]. Group 2: GRACE Framework Overview - GRACE transforms "contrastive learning" into "reinforcement learning," redefining the meaning of contrastive learning signals [6]. - The framework emphasizes generating explanations (rationales) for text before learning embeddings, allowing the model to produce logical and semantically consistent reasoning [7][25]. - GRACE consists of three key modules: 1. Rationale-Generating Policy, which generates explanatory reasoning chains for input texts [8]. 2. Representation Extraction, which combines input and rationale to compute final embeddings [9]. 3. Contrastive Rewards, which redefines contrastive learning objectives as a reward function for reinforcement learning updates [11]. Group 3: Training Process - GRACE can be trained in both supervised and unsupervised manners, utilizing labeled query-document pairs and self-alignment techniques [12][18]. - In the supervised phase, the model learns semantic relationships from a dataset of 1.5 million samples [13]. - The unsupervised phase generates multiple rationales for each text, encouraging consistent representations across different explanations [17]. Group 4: Experimental Results - GRACE was evaluated across 56 datasets in various tasks, showing significant performance improvements over baseline models in retrieval, pair classification, and clustering [19][20]. - The results indicate that GRACE not only enhances embedding capabilities without sacrificing generative abilities but also provides transparent representations that can be understood by users [25][27]. Group 5: Conclusion - Overall, GRACE represents a paradigm shift in embedding models, moving towards a framework that can explain its understanding process, thus enhancing both performance and interpretability [28].
马斯克亲自点名Karpathy迎战Grok 5,别神话LLM,AGI还要等十年
3 6 Ke· 2025-10-21 02:21
Core Insights - The path to Artificial General Intelligence (AGI) is acknowledged to exist but is fraught with challenges, with a timeline of approximately 10 years suggested for its realization [1][3][12]. Group 1: Challenges in Achieving AGI - Karpathy highlights several significant challenges in achieving AGI, including sparse reinforcement learning signals, risks of model collapse, and the need for better environmental and evaluative frameworks [2][3]. - He critiques the current hype surrounding AI, suggesting that the industry has overestimated the intelligence level of existing AI systems [1][3]. Group 2: Perspectives on AGI Timeline - The timeline of 10 years for AGI is considered optimistic compared to the current hype, indicating a more realistic approach to expectations in the field [12][15]. - Karpathy believes that while there has been substantial progress in large language models (LLMs), there remains a considerable amount of work to be done before achieving a fully autonomous AGI capable of outperforming humans in all tasks [17][18]. Group 3: Reinforcement Learning and Learning Paradigms - Karpathy expresses skepticism about the effectiveness of traditional reinforcement learning (RL), suggesting that it may not be the complete solution for developing AGI [21][24]. - He advocates for alternative learning paradigms, such as "agentic interaction," which could provide better opportunities for LLMs to engage with their environments [24][25]. Group 4: Collaboration vs. Competition - In a notable exchange, Elon Musk challenged Karpathy to a programming duel with Grok 5, which Karpathy declined, preferring collaboration over competition [4][5]. - This reflects a broader sentiment in the industry that emphasizes the importance of refining tools and methodologies rather than engaging in competitive showdowns [9][32]. Group 5: Future of AI and Automation - Karpathy discusses the potential for AI to enhance productivity across various sectors, emphasizing that automation will likely complement human roles rather than completely replace them [34]. - He suggests that the future of AI will involve a careful balance of human oversight and AI capabilities, particularly in programming and decision-making processes [32][33].
Karpathy泼冷水:AGI要等10年,根本没有「智能体元年」
3 6 Ke· 2025-10-21 02:15
Core Insights - Andrej Karpathy discusses the future of AGI and AI over the next decade, emphasizing that current "agents" are still in their early stages and require significant development [1][3][4] - He predicts that the core architecture of AI will likely remain similar to Transformer models, albeit with some evolution [8][10] Group 1: Current State of AI - Karpathy expresses skepticism about the notion of an "agent era," suggesting it should be termed "the decade of agents" as they still need about ten years of research to become truly functional [4][5] - He identifies key issues with current agents, including lack of intelligence, weak multimodal capabilities, and inability to operate computers autonomously [4][5] - The cognitive limitations of these agents stem from their inability to learn continuously, which Karpathy believes will take approximately ten years to address [5][6] Group 2: AI Architecture and Learning - Karpathy predicts that the fundamental architecture of AI will still be based on Transformer models in the next decade, although it may evolve [8][10] - He emphasizes the importance of algorithm, data, hardware, and software system advancements, stating that all are equally crucial for progress [12] - The best way to learn about AI, according to Karpathy, is through hands-on experience in building systems rather than theoretical approaches [12] Group 3: Limitations of Current Models - Karpathy critiques current large models for their fundamental cognitive limitations, noting that they often require manual coding rather than relying solely on AI assistance [13][18] - He categorizes coding approaches into three types: fully manual, manual with auto-completion, and fully AI-driven, with the latter being less effective for complex tasks [15][18] - The industry is moving too quickly, sometimes producing subpar results while pretending to achieve significant advancements [19] Group 4: Reinforcement Learning Challenges - Karpathy acknowledges that while reinforcement learning is not perfect, it remains the best solution compared to previous methods [22] - He highlights the challenges of reinforcement learning, including the complexity of problem-solving and the unreliability of evaluation models [23][24] - Future improvements may require higher-level "meta-learning" or synthetic data mechanisms, but no successful large-scale implementations exist yet [26] Group 5: Human vs. Machine Learning - Karpathy contrasts human learning, which involves reflection and integration of knowledge, with the current models that lack such processes [28][30] - He argues that true intelligence lies in understanding and generalization rather than mere memory retention [30] - The future of AI should focus on reducing mechanical memory and enhancing cognitive processes similar to human learning [30] Group 6: AI's Role in Society - Karpathy views AI as an extension of computation and believes that AGI will be capable of performing any economically valuable task [31] - He emphasizes the importance of AI complementing human work rather than replacing it, suggesting a collaborative approach [34][36] - The emergence of superintelligence is seen as a natural extension of societal automation, leading to a world where understanding and control may diminish [37][38]
最后1个名额!强化学习在人形/四足/机械臂等方向上的应用
具身智能之心· 2025-10-21 00:03
Core Insights - Reinforcement Learning (RL) remains a significant field, with increasing applications in robotics, including humanoid and quadrupedal robots, as well as in product optimization across various industries [1][2][3] - The complexity of RL poses challenges for newcomers, making it difficult to produce publishable research papers without a structured learning system [5][6][9] Group 1: Importance of Reinforcement Learning - RL is crucial for tasks such as gait control in embodied intelligent robots, which is essential for achieving general-purpose capabilities [2] - Companies like Yushun and Zhiyuan utilize RL for humanoid robots to perform complex actions like climbing stairs, running, and dancing, enabling applications in rescue and hazardous environments [2][8] Group 2: Challenges in Learning and Research - The extensive and intricate nature of RL makes it hard for beginners to enter the field, often leading to frustration and abandonment of learning [5][9] - Producing a paper that meets the standards of peer review requires proficiency in methodology, experimental results, and writing, with any misstep potentially resulting in low scores from reviewers [5][6] Group 3: Educational Initiatives - To address the entry barriers in RL research, a specialized 1v6 mentoring course has been launched, targeting graduate students and others needing guidance in paper writing [6][7] - The course includes weekly live sessions, project implementation, experimental guidance, and writing refinement, aiming to help participants produce a draft suitable for submission to top conferences and journals [7][9][15] Group 4: Course Structure and Content - The course spans 14 weeks of intensive online training followed by 8 weeks of maintenance support, focusing on various aspects of RL and robotics [9][15] - Key topics include foundational RL concepts, simulation environments, sim2real techniques, and writing guidance, with a structured approach to ensure participants achieve measurable milestones [15][19][20]
腾讯研究院AI速递 20251021
腾讯研究院· 2025-10-20 16:01
Group 1: Oracle's AI Supercomputer - Oracle launched the world's largest cloud AI supercomputer, OCI Zettascale10, consisting of 800,000 NVIDIA GPUs, achieving a peak performance of 16 ZettaFLOPS, serving as the core computing power for OpenAI's "Stargate" cluster [1] - The supercomputer utilizes a unique Acceleron RoCE network architecture, significantly reducing communication latency between GPUs and ensuring automatic path switching during failures [1] - Services are expected to be available to customers in the second half of 2026, with the peak performance potentially based on low-precision computing metrics, requiring further validation in practical applications [1] Group 2: Google's Gemini 3.0 - Google's Gemini 3.0 appears to have launched under the aliases lithiumflow (Pro version) and orionmist (Flash version) in the LMArena, with Gemini 3 Pro being the first AI model capable of accurately recognizing clock times [2] - Testing shows that Gemini 3 Pro excels in SVG drawing and music composition, effectively mimicking musical styles while maintaining rhythm, with significantly improved visual performance compared to previous versions [2] - Despite the notable enhancements in model capabilities, the evaluation methods in the AI community remain traditional, lacking innovative assessment techniques [2] Group 3: DeepSeek's OCR Model - DeepSeek has open-sourced a 3 billion parameter OCR model, DeepSeek-OCR, which achieves a compression rate of less than 10 times while maintaining 97% accuracy, and around 60% accuracy at a 20 times compression rate [3] - The model consists of DeepEncoder (380M parameters) and DeepSeek 3B-MoE decoder (activated parameters 570M), outperforming GOT-OCR2.0 in OmniDocBench tests using only 100 visual tokens [3] - A single A100-40G GPU can generate over 200,000 pages of LLM/VLM training data daily, supporting recognition in nearly 100 languages, showcasing its efficient visual-text compression potential [3] Group 4: Yuanbao AI Recording Pen - Yuanbao has introduced a new feature for its AI recording pen, utilizing Tencent's Tianlai noise reduction technology to enable clear and accurate recording and transcription without additional hardware [4] - The "Inner OS" feature interprets the speaker's underlying thoughts and nuances, helping users stay focused on the core content of meetings or conversations [4] - The recording can intelligently separate multiple speakers in a single audio segment, enhancing clarity in meeting notes without the need for repeated listening [4] Group 5: Vidu's Q2 Features - Vidu's Q2 reference generation feature officially launched globally on October 21, with a reasoning speed three times faster than the Q1 version, supporting multi-subject consistency generation and precise semantic understanding while maintaining 1080p HD video quality [5][6] - The video extension feature allows free users to generate videos up to 30 seconds long, while paid users can extend videos up to 5 minutes, supporting text-to-video, image-to-video, and reference video generation [6] - The Vidu app has undergone a comprehensive redesign, transitioning from an AI creation platform to a one-stop AI content social platform, featuring a vast subject library for easy collaborative video generation [6] Group 6: Gemini's Geolocation Intelligence - Google has opened the Gemini API to all developers, integrating Google Maps functionality to provide location awareness for 250 million places, charging $25 for every 1,000 fact-based prompts [7] - The feature supports Gemini 2.5 Flash-Lite, 2.5 Pro, 2.5 Flash, and 2.0 Flash models, applicable in scenarios such as restaurant recommendations, route planning, and travel itinerary planning, offering real-time traffic and business hours queries [7] - This development signifies a shift in AI from static tools to dynamic "intelligent spaces," with domestic competitor Amap having previously launched smart applications [7] Group 7: AI Trading Experiment - The Alpha Arena experiment initiated by nof1.ai allocated $10,000 each to GPT-5, Gemini 2.5 Pro, Claude 4.5 Sonnet, Grok 4, Qwen3 Max, and DeepSeek V3.1 for real market trading, with DeepSeek V3.1 achieving over $3,500 in profits, ranking first [8] - DeepSeek secured the highest returns with only five trades, while Grok-4 followed closely with one trade, and Gemini 2.5 Pro incurred the most losses with 45 trades [8] - This experiment views the financial market as the ultimate test for intelligence, focusing on survival in uncertainty rather than mere cognitive capabilities [8] Group 8: Robotics Development - Yushu has released its fourth humanoid robot, H2, standing 180 cm tall and weighing 70 kg, with a BMI of 21.6, featuring 31 joints, an increase of about 19% compared to the R1 model [9] - H2 has significantly upgraded its movement fluidity and bionic features, capable of ballet dancing and martial arts, with a "face" appearance, earning the title of "the most human-like bionic robot" [9] - Compared to its predecessor H1, H2's joint control and balance algorithms have been greatly optimized, expanding its application prospects from industrial automation to entertainment and companionship services [9] Group 9: Karpathy's Insights on AGI - Karpathy expressed in a podcast that achieving AGI may still take a decade, presenting a more pessimistic view compared to the general optimism in Silicon Valley, being 5-10 times more cautious [10] - He criticized the inefficiency of reinforcement learning, likening it to "sucking supervision signals through a straw," highlighting its susceptibility to noise and interference [10] - He introduced the concept of a "cognitive core," suggesting that future models will initially grow larger before becoming smaller and more focused on a specialized cognitive nucleus [11]
Karpathy 回应争议:RL 不是真的不行,Agent 还需要十年的预测其实很乐观
Founder Park· 2025-10-20 12:45
Group 1 - The core viewpoint expressed by Andrej Karpathy is that the development of Artificial General Intelligence (AGI) is still a long way off, with a timeline of approximately ten years being considered optimistic in the current hype environment [10][21][23] - Karpathy acknowledges the significant progress made in Large Language Models (LLMs) but emphasizes that there is still a considerable amount of work required to create AI that can outperform humans in any job [11][12] - He critiques the current state of LLMs, suggesting they have cognitive flaws and are overly reliant on pre-training data, which may not be a sustainable learning method [13][14] Group 2 - Karpathy expresses skepticism about the effectiveness of reinforcement learning (RL), arguing that it has a poor signal-to-noise ratio and is often misapplied [15][16] - He proposes that future learning paradigms should focus on agentic interaction rather than solely relying on RL, indicating a shift towards more effective learning mechanisms [15][16] - The concept of a "cognitive core" is introduced, suggesting that LLMs should be simplified to enhance their generalization capabilities, moving away from excessive memory reliance [19] Group 3 - Karpathy critiques the current development of autonomous agents, advocating for a more collaborative approach where LLMs assist rather than operate independently [20][21] - He believes that the next decade will be crucial for the evolution of agents, with significant improvements expected in their capabilities [21][22] - The discussion highlights the need for realistic expectations regarding the abilities of agents, warning against overestimating their current capabilities [20][21] Group 4 - Karpathy emphasizes the importance of understanding the limitations of LLMs in coding tasks, noting that they often misinterpret the context and produce suboptimal code [47][48] - He points out that while LLMs can assist in certain coding scenarios, they struggle with unique or complex implementations that deviate from common patterns [48][49] - The conversation reveals a gap between the capabilities of LLMs and the expectations for their role in software development, indicating a need for further advancements [52]
LLM记忆管理终于不用“手把手教”了,新框架让智能体自主管理记忆系统
量子位· 2025-10-20 10:29
Core Insights - The article introduces Mem-α, an innovative reinforcement learning framework designed to enable large language models (LLMs) to autonomously manage complex memory systems, moving away from reliance on manual design and predefined instructions [2][4][14]. Memory Management Challenges - Traditional memory-enhanced agents often depend on predefined instructions and tools for memory updates, which can lead to suboptimal memory construction and information loss, particularly in long-term interactions [7][9][8]. - LLMs face limitations due to finite context windows, making external memory systems crucial for understanding long-term information [5][6]. Mem-α Framework - Mem-α transforms the memory construction problem into a sequential decision-making problem that can be optimized through reinforcement learning, allowing agents to explore optimal memory management strategies during information processing [14][16]. - The framework incorporates a complex memory system inspired by cognitive science, consisting of core memory, episodic memory, and semantic memory, each supporting various memory operations [22][20]. Training and Evaluation - Mem-α utilizes a multi-dimensional reward function to optimize memory construction, focusing on accurate retrieval, test-time learning, long-range understanding, and conflict resolution [18][28]. - Experimental results demonstrate that Mem-α significantly outperforms existing methods, achieving higher accuracy and efficient memory usage while maintaining performance [35][36]. Key Findings - Mem-α shows superior performance across all tasks, particularly in accurate retrieval and long-range understanding, indicating strong generalization capabilities [35]. - The framework reduces memory usage by approximately 50% compared to traditional methods while enhancing performance, validating the effectiveness of semantic compression mechanisms [35]. - The structured architecture of Mem-α proves essential for processing complex information, highlighting the limitations of flat memory representations [35]. - Mem-α exhibits robust generalization to document lengths exceeding 400K tokens, despite being trained on documents averaging less than 30K tokens [35].
具身智能之心交流群成立来!VLA/RL/导航/数采等多个方向
具身智能之心· 2025-10-20 10:00
Group 1 - The establishment of a technical exchange group focused on embodied intelligence has been announced, inviting participation from various stakeholders in the field [1] - The group encompasses nearly 20 sub-directions, indicating a broad scope of interest and expertise within the embodied intelligence domain [1] - Participants are encouraged to engage in discussions related to humanoid robots, quadrupeds, robotic arms, and various advanced technologies such as VLA, large models, VLN, reinforcement learning, mobile operations, multi-modal perception, simulation, and data collection [1]
NeurIPS 2025 | CMU、清华、UTAustin开源ReinFlow,用在线RL微调机器人流匹配策略
机器之心· 2025-10-20 09:15
作者简介:本文第一作者为卡耐基梅隆大学机器人所研究生 Tonghe Zhang,主要研究方向为机器人操作大模型和全身控制算法。合作者为德克萨斯大学奥斯汀分 校博士生 Sichang Su, 研究方向为强化学习和通用机器人策略。指导教师是清华大学和北京中关村学院的 Chao Yu 教授以及清华大学 Yu Wang 教授。 今年,流匹配无疑是机器人学习领域的大热门:作为扩散模型的一种优雅的变体,流匹配凭借简单、好用的特点,成为了机器人底层操作策略的主流手段,并被 广泛应用于先进的 VLA 模型之中 —— 无论是 Physical Intelligence 的 ,LeRobot 的 SmolVLA, 英伟达的 GR00T 和近期清华大学发布的 RDT2。 想要进一步增强开源 VLA 模型的能力,除了增加数据多样性,强化学习也是一种高度有效的方法。来自卡内基梅隆大学、清华大学和德克萨斯大学奥斯汀分校的 研究团队提出了一个用于 微调流匹配策略的在线强化学习框架 ReinFlow, 该工作已被 NeurIPS 2025 接收,并开源了详细的复现教程,包括代码、训练权重、和 训练结果。 | WEBSITE VISIT DO ...
AI撕碎了“伪工作”的遮羞布
Hu Xiu· 2025-10-20 08:21
Core Insights - The current AI development may lead to either AGI or a more sophisticated word predictor, which significantly impacts market psychology [2] - A report from MIT indicated that 95% of corporate AI investments yielded zero returns, suggesting a fragile market sentiment [2] - The potential for AI to replace low-level white-collar jobs could liberate humans for more meaningful work, but many individuals may struggle to adapt [3] Group 1 - The discussion on AI's trajectory is crucial as it addresses whether the current advancements will lead to AGI or merely enhance predictive capabilities [2] - Experts' opinions on AI's future have a substantial influence on market sentiment, with pessimistic views highlighting the risks of overvaluation [2] - The notion that AI can handle trivial tasks suggests it may replace jobs that do not utilize higher-level human intelligence [2][3] Group 2 - The short-term effect of AI adoption may boost capital profits, but long-term implications could lead to a decline in overall demand as wealth distribution favors capital [4] - Historical context indicates that significant advancements from the first internet boom took about a decade to materialize, raising concerns about potential downturns in the current AI cycle [4] - The resilience of the market may prove more critical than the initial explosive growth of AI technologies [4]