Workflow
语言
icon
Search documents
丰富表达彰显汉语魅力
Ren Min Ri Bao· 2025-07-03 00:31
也要看到,"语言是存在之家",是每个人的文化之根。我们要培养开放的胸怀,也需练就正确使用语言 的自觉。避免曲解美好的语言意象,拒绝使用黑话烂梗,既是如何使用语言、更好表达的问题,也是如 何对待文化、更好生活的问题。在创新与守正的辩证中去丰富表达、感悟文化,语言的大河就会更加澄 澈明净、更加充满活力。 《 人民日报 》( 2025年07月03日 05 版) (责编:胡永秋、杨光宇) 身边一名画家以书画扇面赠朋友,朋友却不悦,认为扇有"散"的谐音,不吉利。其实,人际交往中赠文 化扇或雨伞,本是一番好意,因谐音而感到扫兴,大可不必。 多样的表达折射出多重情感,反映着多彩生活。"青青之竹形兆直,妙华长竿纷实翼""竹伞遮云径,藤 鞋踏藓矼"……古人诗词中,扇和伞两个意象表达的多是正面意涵。其实,中国人自古就有用谐音表达 吉祥的传统。比如"羊"谐音"祥",扇子有"善解人意"的寓意,猫和蝴蝶谐音"耄耋",有祝人长寿之意。 丰富的表达,是汉语文化博大精深的生动写照。 语言是一条有生命力的河流,总有新的含义和表达汇入,它才保持着旺盛的活力,拥有了更好刻画时代 的能力。如今,一些有新意的"网言网语"广为流传,说明创新创造正赋予汉 ...
全球首个自动驾驶VLA综述重磅发布:VLA自驾模型全面拆解(麦吉尔&清华等)
自动驾驶之心· 2025-07-02 13:54
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 麦吉尔大学、清华大学、小米公司 和威斯康辛麦迪 逊的研究团队 最新的工作! 面向自动驾驶的视觉-语言-动作模型综述! 如果您有 相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一 步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Sicong Jiang等 编辑 | 自动驾驶之心 "自动驾驶未来已来?" 当视觉(Vision)、语言(Language)和行动(Action)三大能力在一个模型中融合,自动驾驶的未来将走 向何方? 近日,来自麦吉尔大学、清华大学、小米公司和威斯康辛麦迪逊的研究团队联合发布了全球首篇针对自动 驾驶领域的视觉-语言-行动(Vision-Language-Action, VLA)模型的全面综述。这篇题为《A Survey on Vision-Language-Action Models for Autonomous Driving 》 的 论 文 , 系 统 性 地 ...
VQ-VLA:大规模合成数据驱动动作tokenizer,推理速度提升近三倍
具身智能之心· 2025-07-02 10:18
1. 动作表示效率低 :传统连续动作离散化方法(如均匀分桶)难以捕捉复杂时空动态,导致长时域任务 中累积误差增大 2. 数据依赖瓶颈 :真实机器人数据采集成本高,限制模型泛化能力 点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Yating Wang等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 背景 视觉-语言-动作模型(VLA) 在多模态机器人控制中面临两大挑战: 核心贡献 通用动作分词器框架 :提出基于卷积残差VQ-VAE的通用动作分词器框架,替代传统分桶离散化方法。 合成数据驱动缩放 :首次证明动作轨迹的合成-真实域差异极小( Table 3 显示纯合成数据训练的VQ 在真实任务接近混合数据性能),利用超大规模合成数据(100倍于先前工作)训练分词器。 性能全面优化 :显著提升VLA模型的三项核心指标: 成功率 :长时域任务成功率最高提升30%(真实机器人实验 Figure 3 ) 关键技术方案 1. 卷积残差VQ-VA ...
机器人导航的2个模块:视觉语言导航和目标导航有什么区别?
具身智能之心· 2025-07-02 10:18
Core Viewpoint - The article discusses the evolution of robot navigation technology from traditional mapping and localization to large model-based navigation, which includes visual language navigation (VLN) and goal navigation. VLN focuses on following instructions, while goal navigation emphasizes understanding the environment to find paths independently [1][4]. Summary by Sections Visual Language Navigation (VLN) - VLN is fundamentally a task of following instructions, which involves understanding language commands, perceiving the environment, and planning movement strategies. The VLN robot system consists of three main modules: visual language encoder, environmental history representation, and action strategy [2]. - The robot processes language commands and visual observations, requiring effective information compression through a visual language encoder. Key issues include the choice of encoder and whether to project visual and language representations into a common space [2]. - The learning of the strategy network has shifted from extracting patterns from labeled datasets to distilling effective planning information from large language models (LLMs) [3]. Goal Navigation - Goal navigation extends VLN by enabling agents to explore unfamiliar 3D environments and plan paths based solely on target descriptions, such as coordinates or images [4]. - Unlike traditional VLN, goal-driven navigation requires a transition from "understanding instructions to finding paths" autonomously, involving semantic parsing, environmental modeling, and dynamic decision-making [6]. Commercial Application and Demand - Goal-driven navigation technology has been implemented in various verticals, such as terminal delivery, where it combines with social navigation algorithms to handle dynamic environments. Examples include Meituan's delivery robots and Starship Technologies' campus delivery robots [8]. - In sectors like healthcare, hospitality, and food service, companies like 嘉楠科技, 云迹科技, and Aethon have deployed service robots for autonomous delivery, enhancing service efficiency [8]. - The development of humanoid robots has led to an increased focus on adapting navigation technology, with companies like Unitree and Tesla showcasing advanced capabilities [9]. - The growth in this sector has created significant job demand, particularly in navigation roles, which are recognized as one of the first technology subfields to achieve practical application [9]. Knowledge and Learning Challenges - Both VLN and goal navigation encompass a wide range of knowledge areas, including natural language processing, computer vision, reinforcement learning, and graph neural networks. This complexity presents challenges for learners seeking to enhance their interdisciplinary skills [10].
让Claude当老板卖零食,结果大翻车:囤钨块、卖高价可乐、还声称要开除人类
3 6 Ke· 2025-07-02 10:08
"如果让 AI 管零食冰箱,它会做得比人类好吗?" 这个听起来有些无厘头的问题,最近被 Anthropic 团队以一种非常"离谱"的方式认真地回答了——他们真的让 Claude 3.7 接手公司小冰箱的售货运营业 务,结果却上演了一出 AI 版的办公室情景喜剧。 在这场被称为「Project Vend」的实验中,Anthropic 联合 AI 安全公司 Andon Labs,设置了一个非常接地气的场景:让 Claude AI 充当一名"自动售货机运 营经理",负责管理公司一台放在办公室角落的小冰箱,包括订货、定价、收款、回应员工请求等日常运营任务。 人类点零食,它却卖钨块? 一开始,Claudius 的表现还算规矩。员工们通过 Slack 提需求,比如"来点可乐"、"买点薯片"。Claudius就乖乖上网下单、安排补货。可后来,有员工开玩 笑说道"来点钨块",画风就开始逐渐变得离谱。 Claudius 没有理解"钨块"作为玩笑的语境,反而异常兴奋地展开了采购行动,大量订购钨块,直接把原本应该放饮料的小冰箱塞满了金属块。此外,它还 试图把零度可乐卖到 3 美元(约合 21 元人民币)一瓶,哪怕员工直接告诉它"这 ...
AI:加速能力退化的元凶
3 6 Ke· 2025-07-02 07:16
Core Viewpoint - The article argues that over-reliance on Large Language Models (LLMs) is leading to a decline in critical thinking among engineers, emphasizing the need to preserve the essence of programming as a craft [1][3][17]. Group 1: Risks of Over-Reliance on LLMs - Engineers who treat LLMs as partners often prioritize speed over depth of thought, which can lead to a decline in their skills and critical thinking [5][6]. - The use of LLMs can result in a loss of the flow state and creative enjoyment for many developers [7]. - LLMs may produce incorrect code or code with hidden logical flaws, increasing risks if users lack judgment [12]. Group 2: Importance of Program Theory and Entropy - LLMs cannot grasp program theory and program entropy, which are essential for effective programming and understanding the complexities of software development [9][13]. - Program theory emphasizes that programming is about forming insights and theories rather than just writing code, which is crucial for maintaining and modifying software [10][11]. - Program entropy highlights that any modification to a program increases complexity, and only humans can effectively manage this entropy [14][15]. Group 3: Long-Term Value of Human Engineers - The article suggests that LLMs will not replace human engineers, as the unique human ability to think critically and deeply about engineering problems remains irreplaceable [8][18]. - Companies pursuing AI for cost reduction may face new risks and long-term costs, indicating that the value of human engineering skills will persist [18][19].
同时监督和强化的单阶段大模型微调,告别“先背书再刷题”,推理泛化双提升|中科院&美团等
量子位· 2025-07-02 02:02
SRFT团队 投稿 量子位 | 公众号 QbitAI 通过单阶段监督微调与强化微调结合,让大模型在训练时能同时利用专家演示和自我探索试错,有效提升大模型推理性能。 中国科学院自动化研究所深度强化学习团队 联合 美团 ,提出一种 单阶段监督-强化微调方法——SRFT (Supervised Reinforcement Fine-Tuning) 。该方法通过基于熵的动态加权机制,将两种训练范式结合。 在大语言模型(LLM)的推理能力提升上,监督微调(SFT) 和强化学习(RL,有时也称作强化微调,RFT)是两条核心技术路线。但它们 各自都存在瓶颈: SFT擅长模仿专家解题思路,类似"背书",能快速为模型打下基础,但缺点是容易陷入死记硬背,缺乏在新问题上灵活应用和寻找最优解的能 力; RFT/RL通过不断试错来探索解题方法,类似"刷题",能够发现更优解法,但其探索过程效率低下,容易面临模式崩溃风险。 因此,目前研究者通常采用两阶段 顺序 方法SFT→RFT/RL:先用SFT学习高质量数据集,再用RFT/RL进一步优化对齐LLM策略(即先"背 完书"再"去刷题")。 然而,这种串行方式不仅影响学习效率,还常常导致模型 ...
大模型时代,通用视觉模型将何去何从?
机器之心· 2025-07-02 00:54
Core Viewpoint - The article discusses the evolution of Vision Generalist Models (VGM) in the context of the rise of multimodal large models, emphasizing the need for a distinct focus on visual data despite the shift towards integrating visual modalities with language models [1][2]. Group 1: VGM Overview - VGM aims to create a unified framework capable of handling various visual tasks and modalities, similar to the success of large language models in natural language processing [7]. - VGM's key capability is its ability to process multimodal inputs, including images, point clouds, and videos, through a shared representation method [7][8]. - The model supports multiple visual tasks simultaneously, allowing for parallel processing within a single framework [8]. Group 2: Data, Tasks, and Evaluation - VGM utilizes large and diverse datasets for training and evaluation, covering various types of visual data to support multimodal learning [9]. - Visual tasks are categorized into four types: image tasks, geometric tasks, time series tasks, and other visual-related tasks [9]. - Modern evaluation methods focus on cross-task generalization and multimodal processing capabilities, differing from traditional single-task assessments [9]. Group 3: Model Design Paradigms - Existing VGM design paradigms focus on unifying different visual modality inputs and diverse task outputs, primarily categorized into encoding-based frameworks and sequence-to-sequence frameworks [12][13]. - Encoding-based frameworks create a shared feature space for different input modalities, while sequence-to-sequence frameworks are suitable for tasks with variable-length inputs and outputs [12][13]. Group 4: Current Progress and Future Directions - Current VGM research has made significant progress in unified processing of multiple tasks and modalities but faces challenges in optimizing framework design and improving training efficiency [16]. - Data acquisition and annotation remain bottlenecks for VGM development, with future research likely focusing on automated annotation techniques and large-scale unsupervised learning methods [16]. - Despite challenges, VGM shows extensive potential in practical applications, extending beyond traditional visual tasks to complex multimodal tasks across various fields such as intelligent surveillance, autonomous driving, and robotics [16].
腾讯研究院AI速递 20250702
腾讯研究院· 2025-07-01 16:38
生成式AI 3. Meta计划未来几年投入数千亿美元用于AI基础设施、模型训练和人才储备,目标一年内推 出超越Llama系列的下一代领先模型。 一、 争夺3500亿!2025,中国芯片集体冲刺IPO , 排队 上 市 1. 国产芯片企业纷纷冲刺IPO,摩尔线程、沐曦等近10家"中国英伟达"已进入上市流程,呈 现营收增长但持续亏损状态; 2. 中国AI芯片市场规模可达3500亿人民币,理论上可容纳35家年营收100亿元的GPU企业, 但产能受限成为行业共同挑战; 3. 国产GPU面临代工产能受限、生态构建不足等困境,需在B端AI应用或C端图形领域寻求差 异化竞争机会。 https://mp.weixin.qq.com/s/MPmn7Eh0qVEIEkgOz8ebww 二、 Meta 成立「超级智能实验室」,11人豪华团队中华人占大半 1. Meta正式成立"超级智能实验室"(MSL),将整合基础AI研究、大语言模型开发和AI产品团 队,由新任首席AI官Alexandr Wang领导; 2. 该实验室成功从OpenAI、Anthropic、Google挖来11位顶尖AI人才,华人占比超半数,包 括GPT-4o和G ...
42家上市银行齐涨 行情能否延续?
近日,银行板块持续震荡上涨。截至7月1日收盘,Wind银行业指数上涨1.51%,A股42家上市银行全部 飘红。其中,36家上市银行涨幅在1%以上,苏州银行涨幅达5.13%,厦门银行涨幅达3.98%。 业内人士认为,近期多家银行召开股东大会,分红、战略转型是关键词,也为后续银行股上涨打下基 础。 机构资金涌入+分红加码点燃做多热情 截至7月1日收盘,A股36家上市银行涨幅在1%以上,苏州银行涨幅达5.13%,厦门银行涨幅达3.98%。 且2025年一季度,商业银行成本收入比为29%,较上年提升0.05个百分点,基本保持稳定。尽管各项降 本增效措施加速落地,但在营收增长乏力的情况下,商业银行运营费用相对刚性,压降空间有限,从而 导致成本收入比提升。 值得一提的是,在近期银行股东大会上,多家银行提出"转型"关键词,投资人得以进一步了解银行下一 步发展方向。 招商银行行长王良称,要适应低利率环境带来的巨大考验,所以招商银行在今年年初的工作会议上提出 要加快"四化"转型,即加快国际化的发展,让该行业务结构更加适应中国企业走出去的金融服务需求, 避免简单依赖利率较低的单一市场;要加快综合化的发展,通过综合化经营,使该行的 ...