多模态大语言模型(MLLMs)
Search documents
从MLLM到Agent:万字长文览尽大模型安全进化之路!
自动驾驶之心· 2025-09-03 23:33
点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我 -> 领取大模型巨卷干货 >> 点击进入→ 大模型技术 交流群 本文只做学术分享,如有侵权,联系删文 写在前面&笔者的个人理解 人工智能已从单一文本交互迈入多模态理解与智能体自主决策的新阶段。从处理纯文本的 大语言模型 (LLMs) ,到融合图像、音频的 多模态大语言模型(MLLMs) ,再到具备环境感知、任务规划能力的 智能体(Agents) ,大模型的 能力上限持续扩张,但安全风险也随之呈指数级增长 。 其中, 越狱攻击 作为最具威胁性的安全风险之一,始终困扰着大模型生态—— 攻击者通过精心设计的输 入或环境扰动,绕过模型的安全机制,诱导其生成违法、有害、违背伦理的内容 ,小则传播虚假信息、煽 动仇恨,大则引发网络攻击、隐私泄露等严重后果。然而,现有研究多聚焦于 单一形态模型 (如LLMs) 的攻击与防御,缺乏对LLMs-MLLMs-Agents 全演进链路 的系统性梳理,更未形成 统一的攻击分类框架、 评估标准与防御体系 。 在这一背景下,来自河南大学软件学院与中国科学院信息工程研究所的研究团队,对该领域进行了全面的 综述总结。该综述不仅 系 ...
大模型驱动空间智能综述:具身智能体、智慧城市与地球科学的进展
欧米伽未来研究所2025· 2025-04-20 14:32
" 欧米伽未来研究所 " 关注科技未来发展趋势,研究人类向欧米伽点演化过程中面临的重大机遇与挑战。将不定期推荐和发布世界范围重要科技研究进展和未 来趋势研究。( 点击这里查看欧米伽理论 ) 我们生活在一个由空间构成的世界中。从每天在家居、办公环境或城市街道中的移动,到规划一次跨越山海的旅行,乃至科学家们研究气候变迁的地理模 式、城市扩张的复杂格局,这一切都深刻地依赖于我们对空间的感知、理解和运用能力。这种核心能力,我们称之为"空间智能"。 长久以来,人类凭借自身的感官系统和发达的大脑,不断地探索、适应并改造着周遭的空间环境,演化出了独特的空间认知机制。而今,随着人工智能 (AI)技术的日新月异,特别是大语言模型(LLMs)的横空出世,机器也开始显露出令人瞩目的空间智能潜力。这场由大模型引领的技术浪潮,正以前 所未有的深度和广度,渗透到从微观尺度的机器人导航,到中观尺度的城市规划管理,再到宏观尺度的地球科学研究等诸多领域。 这部报告由清华大学和芬兰赫尔辛基大学共同发布,将带领读者一同深入探究,大模型是如何被赋予"空间感"的?它们在跨越不同尺度的空间智能任务中 扮演着怎样日益重要的角色?以及在迈向更高级空间智能的 ...
大模型驱动空间智能综述:具身智能体、智慧城市与地球科学的进展
欧米伽未来研究所2025· 2025-04-20 14:32
Core Viewpoint - The article discusses the evolution of spatial intelligence through the development of large language models (LLMs) and their applications across various scales, from micro-level robotics to macro-level earth sciences, highlighting both opportunities and challenges in this field [1][2][35]. Section Summaries Section 1: The Foundation of Spatial Intelligence - How Large Models "Understand" Space - To enable machines to possess spatial intelligence, they must develop effective spatial memory and flexible abstract spatial reasoning capabilities [2][3]. Section 2: Spatial Memory and Knowledge - The "Cognitive Map" in Large Models - Large models acquire spatial information through "internal absorption" during pre-training and "external invocation" when needing real-time data [4][5]. Section 3: Abstract Spatial Reasoning - Beyond Memorization - Current large models primarily mimic spatial tasks using language modeling rather than possessing deep spatial reasoning akin to human cognition [9]. Section 4: Multi-Scale Spatial Intelligence Applications Driven by Large Models - Large models are increasingly important in various spatial intelligence tasks across different scales, from individual robots to urban environments and global systems [10][11]. Section 5: Embodied Intelligence - Enhancing Robot Spatial Understanding and Action - The development of embodied intelligence focuses on enabling robots to perceive, understand, and navigate physical environments effectively [11][12]. Section 6: Urban Spatial Intelligence - Empowering Smarter, More Livable Cities - Large models are applied in urban settings to enhance spatial understanding, reasoning, and decision-making for better city management [15][16]. Section 7: Earth Spatial Intelligence (ESI) - Insights into Our Planet - ESI leverages AI and large models to analyze vast amounts of earth observation data, addressing global challenges like climate change and resource management [20][21]. Section 8: Challenges and Prospects - The Future of Spatial Intelligence - Despite significant advancements, challenges remain in spatial reasoning, data integration, and model interpretability, necessitating ongoing research and development [29][30].
多模态大模型对齐新范式,10个评估维度全面提升,快手&中科院&南大打破瓶颈
量子位· 2025-02-26 03:51
Core Viewpoint - Despite significant advancements in multimodal large language models (MLLMs), existing models still lack sufficient alignment with human preferences, primarily due to the focus of current alignment research on specific areas such as reducing hallucination issues. The overall impact of aligning with human preferences on enhancing various capabilities of MLLMs remains uncertain [1]. Group 1: Contributions - Introduction of a new dataset containing 120,000 finely annotated preference comparison pairs, significantly improving scale, sample diversity, annotation granularity, and quality compared to existing resources [5]. - Development of an innovative critique-based reward model that provides better interpretability and more informative feedback than traditional scalar reward mechanisms, achieving superior performance with a 7B model compared to existing 72B models [5]. - Implementation of dynamic reward scaling to optimize the use of high-quality comparison pairs, enhancing data utilization efficiency [5]. - Comprehensive evaluation across 10 dimensions and 27 benchmarks, demonstrating significant and consistent performance improvements in various aspects [5][15]. Group 2: Data Sources and Annotation - Data sources include image datasets from LLaVA-OV, VLfeedback, and others, totaling 10 million samples, with video data primarily from SharedGPT-4-video [6]. - Annotation focuses on three dimensions: usefulness, authenticity, and ethics, with detailed scoring and ranking justifications provided by human experts [7]. Group 3: Performance Evaluation - The proposed model framework shows competitive performance against GPT-4o across multiple benchmarks, with notable improvements in custom benchmarks, validating the effectiveness of the training algorithm's reward signals [10]. - The model's conversational abilities and safety features improved significantly, with average enhancements exceeding 10% in conversation benchmarks and a reduction of unsafe behaviors by at least 50% [17]. Group 4: Future Research Directions - The study emphasizes the potential for further exploration of the dataset's value, particularly in utilizing the rich annotation granularity to enhance current alignment algorithms and address specific benchmark limitations [21]. - Future efforts will focus on leveraging detailed information and advanced optimization techniques to improve MLLM alignment and establish a more universal multimodal learning framework [22].