端到端强化学习
Search documents
AAAI 2026 Oral|LENS:基于统一强化推理的分割大模型
机器之心· 2025-12-29 04:44
在这个工作中,我们研究了分割大模型领域的一大一小两个关键问题,大问题就是老生常谈的 "泛化能力",传统分割大模型对未见过的提示和领域的泛化 能力往往有限;小问题则是隐藏的 "信息瓶颈",此前的分割大模型从 "大脑思考"(MLLM)到 "分割解码"(SAM)之间往往只通过单一的分割 Token 传 递信息,存在隐形的 "信息输送瓶颈"。 文本提示图像分割(Text-prompted image segmentation)是实现精细化视觉理解的关键技术,在人机交互、具身智能及机器人等前沿领域具有重大的 战略意义。这项技术使机器能够根据自然语言指令,在复杂的视觉场景中定位并分割出任意目标。 然而,当前主流的技术路径,如基于监督式微调(Supervised Fine-Tuning, SFT)的方法,正面临着根本性的瓶颈。这些方法本质上是静态的模式匹配, 虽然在特定数据集上表现优异,但其泛化能力往往受限,形成了一个难以逾越的 "能力天花板"。尤其是在处理需要多步、复杂推理的未知指令时,性能会 显著下降,其根源在于 SFT 方法在训练中忽略了动态的、显式的推理过程。 为了 shatter 这一能力天花板,我们引入了 LE ...
蝶威量化荣获“三年期金牛量化机构(指数增强策略)”奖项
Zhong Zheng Wang· 2025-12-04 09:20
中证报中证网讯(记者 王辉)近日,由中国证券报主办,华鑫证券、西岸集团联合承办,深圳数据经 济研究院提供独家学术支持的"2025量化行业高质量发展大会暨金融科技·量化机构金牛奖颁奖典礼"在 上海举行。上海蝶威私募基金管理有限公司(简称蝶威量化)凭借其优秀的长期业绩表现与稳健的投研 体系,荣获"三年期金牛量化机构(指数增强策略)"奖项。 在风控方面,公司坚持风险"提前规划、实时调节"。公司在投研体系内置了动态风险预算与风险平价框 架,能根据市场波动与策略状态实时调整风险分配。这些风控举措和强化学习策略、组合优化器、交易 执行模块整合在同一个闭环系统中,实现了风控与投研交易的深度协同。 对于此次获奖,蝶威量化投研团队表示,这既是对过往阶段的肯定,更意味着一份责任。未来,蝶威量 化将继续聚焦于端到端强化学习、多源数据挖掘与多阶组合优化的主航道,持续加大技术投入,致力于 为专业机构与高净值投资者提供经得起时间检验、体验更稳健的量化投资解决方案,在不确定性中寻找 可持续的确定性。 在量化投研领域,区别于传统量化模型的分段式流程,蝶威量化的核心优势在于构建了一套端到端强化 学习驱动的投研框架。该框架将信号生成、仓位决策、 ...
瞭望 | 何时摆脱遥控器
Xin Hua She· 2025-11-18 03:06
Core Insights - The development of embodied intelligence in China is rapidly advancing, showcasing impressive capabilities in various tasks, but there is a need to look beyond surface-level achievements to understand the actual limitations of current technology [1][5] - Achieving full autonomy in robots requires significant advancements in their cognitive abilities, particularly in understanding and interacting with the physical world [3][5] Group 1: Technological Challenges - The key to overcoming remote control limitations lies in developing a powerful cognitive framework that allows robots to perceive, decide, execute, and provide feedback autonomously [3][5] - Current advancements in embodied intelligence include the VLA large model, which integrates visual, language, and action modalities to enable robots to understand their environment and execute tasks without human intervention [3][4] - The development of world models, which simulate environmental dynamics, is crucial for enhancing robots' predictive capabilities and decision-making processes [4][5] Group 2: Limitations in General Intelligence - Despite breakthroughs in embodied intelligence, there remains a significant gap in achieving general intelligence, as robots can perform well in specific scenarios but struggle in diverse environments [5][6] - The integration of tactile feedback into robots is a complex challenge, as it requires multi-dimensional perception capabilities that go beyond visual data [5][6] - Current algorithms still lack the generalization ability needed for robots to perform effectively across various tasks and environments [6] Group 3: Standardization and Application - To accelerate the realization of general intelligence, there is a need for standardized frameworks that can facilitate technology alignment and product deployment in real-world scenarios [7][8] - Industry organizations are developing classification frameworks for embodied intelligence, similar to those in autonomous driving, to promote technological advancement and application in various fields [7][8] - The establishment of a four-dimensional, five-level evaluation system for humanoid robots will help define capability requirements and applicable scenarios, thereby enhancing their deployment in sectors like logistics, education, and healthcare [8]
以“类人”驾驶体验重塑智能出行,地平线携HSD亮相香港
Nan Fang Du Shi Bao· 2025-06-13 08:43
Core Insights - The 2025 International Automotive and Supply Chain Expo in Hong Kong showcased cutting-edge vehicles and technologies, highlighting the future of transportation [2] - Horizon Robotics, a leading smart driving technology company, presented its Horizon SuperDrive (HSD) and Journey 6 series, receiving the 2025 China Automotive Supply Chain Innovation Achievement Award [2] - The company emphasizes its commitment to technological innovation and aims to provide optimal, flexible, and upgradable smart driving solutions for global customers [2] Company Developments - Horizon Robotics introduced the "User Trust in Smart Driving" formula, focusing on safety, professionalism, and intimacy to enhance user experience in urban driving [3] - The HSD system features an end-to-end architecture that ensures seamless operation from photon input to trajectory output, providing a human-like driving experience [3] - The system incorporates end-to-end reinforcement learning and a digital twin world to continuously improve driving capabilities [3] Product Offerings - HSD offers various configurations to meet diverse user needs, including HSD 300 for general urban driving, HSD 600 for high-performance L2 urban driving, and HSD 1200 for all-scenario driving [4] - The modular design of the Horizon Cell "magazine system" allows for easy upgrades of the onboard computing platform, similar to upgrading a personal computer [3][4] Market Collaborations - Horizon Robotics has partnered with Chery Group for the mass production of HSD, which will debut in the "Falcon" model under the Exeed brand, expected to launch in Q3 2025 [5] - The company has established close collaborations with leading Tier-1 companies such as Bosch, Denso, and ZF, with over 8 million units of its pre-installed solutions shipped to date [5] - Currently, one in three smart vehicles in the market is equipped with Horizon's driving assistance solutions [5]
特斯联完成战略升级:三项核心业务聚焦空间智能
Jing Ji Guan Cha Wang· 2025-05-22 08:23
Core Viewpoint - The company, Teslin, has submitted an updated prospectus to the Hong Kong Stock Exchange, revealing a strategic upgrade focusing on three key areas: AIoT models, AIoT infrastructure, and AIoT intelligent agents, with an emphasis on spatial intelligence [1][2]. Group 1: Strategic Focus - Teslin aims to drive industrial upgrades and sustainable development through technology, specifically in the AIoT sector, with products deployed in over 800 clients across more than 160 cities globally [2]. - The company’s AIoT domain model serves as an analytical engine, utilizing a "multi-modal" and "model + system + application" commercialization strategy to create specialized models and intelligent applications for various industries [2][3]. - The introduction of the upgraded green computing unit supports various advanced chips and models, establishing a fully domestically developed toolset from chips to platforms [3][5]. Group 2: Financial Performance - In its first year of strategic upgrade, Teslin reported a significant revenue increase of 83.2%, reaching 1.843 billion yuan, with a compound annual growth rate of 58.0% over three years [5][6]. - The company’s expense ratio decreased from 76.9% in 2023 to 45.0% in 2024, while accounts receivable turnover days improved from 238 days in 2022 to 104 days in 2024, indicating enhanced capital efficiency [5][6]. - The AI industrial digitization business saw a remarkable revenue increase of 162.9%, contributing significantly to the overall revenue growth, with a total of 342 clients by the end of 2024 [6]. Group 3: Market Outlook - The global spatial computing market is projected to grow from approximately $149.59 billion in 2024 to over $1,066.13 billion by 2034, with a compound annual growth rate of 21.7%, and the Asia-Pacific market expected to grow at 22.2% [7]. - The company faces the challenge of seizing opportunities in the spatial intelligence sector amidst a complex global market landscape [7].
喝点VC|红杉对话OpenAI Deep Research团队:AI Agent将成为今年最具突破性技术,强化学习重新回归主流
Z Potentials· 2025-03-10 03:07
Core Viewpoint - The article discusses the launch and capabilities of OpenAI's "Deep Research," an AI agent that utilizes end-to-end reinforcement learning to enhance efficiency in complex information retrieval and reasoning tasks, significantly reducing the time required for knowledge work from hours to minutes [2][10][24]. Group 1: Product Overview - "Deep Research" is designed to retrieve information from multiple online sources and generate detailed reports, completing tasks in 5 to 30 minutes compared to hours for humans [6][10]. - The product is part of OpenAI's agent series, following the "Operator" agent, with plans for further expansions including a "Shards Seeker" agent [4][6]. - The development of "Deep Research" was inspired by breakthroughs in reasoning paradigms and aims to tackle complex tasks requiring extensive online research and creativity [7][10]. Group 2: Target Users and Applications - The primary users of "Deep Research" include knowledge workers in various fields such as market analysis, medical research, and personal planning [11][12]. - The product has shown significant utility in scientific research, helping users find relevant literature and data [12]. - It is also beneficial for personal tasks like shopping and travel planning, allowing users to save time and make informed decisions [18][19]. Group 3: Technical Mechanism and Innovations - "Deep Research" employs a fine-tuned version of OpenAI's advanced reasoning model, o3, specifically trained for complex web browsing and reasoning tasks [24][25]. - The model's ability to dynamically adjust its search strategies during information retrieval sets it apart from traditional search engines [25][26]. - The integration of a "Chain-of-Thought" summary allows users to understand the reasoning process behind the model's search strategies, enhancing transparency [25][26]. Group 4: Future Developments and Impact - Future plans for "Deep Research" include expanding its capabilities to access private data and improving its analytical functions for more complex tasks [37][38]. - The potential impact of "Deep Research" on various professions, particularly in consulting and healthcare, is significant, as it can drastically reduce the time spent on research tasks [39][40]. - The technology is expected to empower knowledge workers rather than replace them, enhancing their efficiency and decision-making capabilities [39][40].