Workflow
大语言模型(LLMs)
icon
Search documents
UCLA最新!大模型时序推理和Agentic系统的全面综述
自动驾驶之心· 2025-09-27 23:33
Core Insights - The article discusses the emergence of Time Series Reasoning (TSR) as a new field that integrates large language models (LLMs) with time series data analysis, addressing the limitations of traditional methods [2][8][39] - TSR aims to enhance the capabilities of time series analysis by providing explicit reasoning, causal inference, and decision-making abilities, moving beyond mere prediction and classification [2][8][39] Summary by Sections Traditional Time Series Analysis Limitations - Traditional methods like ARIMA and LSTM excel in specific tasks but face three key limitations: lack of interpretability, inability to handle causal relationships, and insufficient dynamic responses [8][14] - LLMs offer new tools to overcome these limitations by providing explicit reasoning processes, generating causal hypotheses, and enabling interaction with external tools [2][8] Emergence of Time Series Reasoning - TSR is defined as the method of performing explicit structured reasoning on time-indexed data using LLMs, integrating multimodal contexts and agent systems [8][39] - A recent survey from a collaborative team outlines a clear definition of TSR and presents a three-dimensional classification framework covering reasoning structure, task objectives, and technical features [3][9] Three-Dimensional Classification Framework - The framework categorizes TSR into three dimensions: reasoning topology (how reasoning is conducted), core objectives (why reasoning is performed), and attribute labels (auxiliary features of methods) [9][24] - Reasoning topology includes three types: direct reasoning, linear chain reasoning, and branch-structured reasoning, each with varying complexity and capabilities [12][22] Reasoning Topology - Direct reasoning is the simplest form, providing results without showing intermediate steps, which limits interpretability [15] - Linear chain reasoning introduces ordered steps, enhancing interpretability and modularity [18] - Branch-structured reasoning allows for multiple paths and self-correction, increasing flexibility and adaptability [22] Core Objectives of Time Series Reasoning - The core objectives of TSR are categorized into four types: traditional time series analysis, explanation and understanding, causal inference and decision-making, and time series generation [24][28] - Each objective aims to enhance the performance and flexibility of traditional tasks through LLM integration [28] Attribute Labels - Attribute labels provide additional features for classifying methods, including control flow operations, execution agents, information sources, and LLM alignment methods [29][30] - These labels help researchers refine their work and understand the nuances of different approaches [29] Resources and Tools - The article emphasizes the importance of resources and tools for advancing the field, categorizing them into reasoning-first benchmarks, reasoning-ready benchmarks, and general-purpose benchmarks [33][36] - These resources are essential for researchers to test and validate their methodologies effectively [33] Future Directions and Challenges - The field faces several challenges, including standardizing evaluation metrics for reasoning quality, integrating multimodal data, and ensuring the robustness and safety of agent systems [38][39] - Addressing these challenges will define the future trajectory of time series reasoning, aiming for large-scale reliability in critical sectors like finance, healthcare, and energy [39]
从MLLM到Agent:万字长文览尽大模型安全进化之路!
自动驾驶之心· 2025-09-03 23:33
点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我 -> 领取大模型巨卷干货 >> 点击进入→ 大模型技术 交流群 本文只做学术分享,如有侵权,联系删文 写在前面&笔者的个人理解 人工智能已从单一文本交互迈入多模态理解与智能体自主决策的新阶段。从处理纯文本的 大语言模型 (LLMs) ,到融合图像、音频的 多模态大语言模型(MLLMs) ,再到具备环境感知、任务规划能力的 智能体(Agents) ,大模型的 能力上限持续扩张,但安全风险也随之呈指数级增长 。 其中, 越狱攻击 作为最具威胁性的安全风险之一,始终困扰着大模型生态—— 攻击者通过精心设计的输 入或环境扰动,绕过模型的安全机制,诱导其生成违法、有害、违背伦理的内容 ,小则传播虚假信息、煽 动仇恨,大则引发网络攻击、隐私泄露等严重后果。然而,现有研究多聚焦于 单一形态模型 (如LLMs) 的攻击与防御,缺乏对LLMs-MLLMs-Agents 全演进链路 的系统性梳理,更未形成 统一的攻击分类框架、 评估标准与防御体系 。 在这一背景下,来自河南大学软件学院与中国科学院信息工程研究所的研究团队,对该领域进行了全面的 综述总结。该综述不仅 系 ...
唯快不破:上海AI Lab 82页综述带你感受LLM高效架构的魅力
机器之心· 2025-08-25 09:10
Core Insights - The article discusses the advancements and challenges in large language models (LLMs), emphasizing their transformative impact on human-computer interaction and the need for efficient architectures to overcome high training and inference costs [2][3][8]. Group 1: LLM Architecture and Efficiency - The efficiency of LLMs is primarily attributed to the Transformer architecture, which, despite its breakthroughs, faces challenges due to its O(N^2) complexity in long sequence tasks [3][4]. - Recent innovations in Transformer architecture have emerged, but a comprehensive review summarizing these advancements has been lacking [4][5]. - A collaborative effort by Shanghai AI Lab and several institutions has resulted in a survey of over 440 papers, focusing on the latest progress in efficient LLM architectures [5][6]. Group 2: Categories of Efficient Architectures - The survey categorizes efficient LLM architectures into seven types, including linear sequence modeling, sparse sequence modeling, efficient full attention, sparse expert models, mixed model architectures, diffusion language models, and applications to other modalities [6][8]. - Linear sequence modeling aims to reduce attention training and inference complexity without incurring KV cache overhead [6][8]. - Sparse sequence modeling leverages the inherent sparsity of attention maps to accelerate computation [21][22]. Group 3: Innovations in Attention Mechanisms - Efficient full attention methods optimize memory access and KV storage while maintaining complete attention [22][23]. - Sparse expert models enhance model capacity without proportionally increasing computational costs through conditional activation of experts [27][28]. - Mixed architectures find a balance between linear/sparse attention and full attention, optimizing both efficiency and performance [35][36]. Group 4: Applications and Future Directions - Diffusion language models represent a novel approach by applying diffusion models from visual tasks to language generation, significantly improving generation speed [38][39]. - Efficient architectures are being applied across various modalities, including vision and audio, demonstrating their versatility and effectiveness [44][45]. - The overarching goal is to achieve substantial acceleration in AI development, akin to the phrase "Speed Always Wins," suggesting a focus on efficiency in training and deploying powerful models [45].
AI顶会模式出了问题? 「不发表,就出局」的恶性循环,正在压垮整个AI学界
3 6 Ke· 2025-08-13 09:08
Core Insights - The current AI conference model is deemed unsustainable due to overwhelming submission rates, leading to a focus on quantity over quality in research outputs [4][10][18] - A significant increase in publication rates has been observed, with the average author publishing over 4.5 papers annually, doubling in the last decade [4][18] - Environmental concerns are raised, particularly regarding carbon emissions from travel, with NeurIPS 2024's travel emissions exceeding the daily carbon output of Vancouver [19] - Mental health issues are prevalent among researchers, with over 71% of discussions on Reddit expressing negative sentiments, and 35% mentioning mental health challenges [22][24] Challenges Facing AI Conferences - The exponential growth in submissions is straining the peer review system, raising concerns about fairness and academic integrity [10][12] - The rapid pace of AI research often renders findings outdated by the time they are presented at conferences [12][18] - The physical capacity of venues is being exceeded, as seen with NeurIPS 2024, which has a venue capacity of approximately 18,000, leading to restricted access for many participants [27] - The pressure to publish is creating a toxic environment, where researchers prioritize quantity over the depth of their work [7][24] Proposed Solutions - The Community-Federated Conference (CFC) model is suggested as a sustainable alternative, separating traditional conference functions into independent yet interconnected layers [29][30] - The first layer involves a centralized digital platform for peer review and publication, allowing for rolling submissions throughout the year [31] - The second layer consists of regional centers for showcasing research, reducing the need for large venues and minimizing carbon footprints [32] - The third layer emphasizes digital synchronization and collaboration, connecting researchers across regions through virtual channels [33]
AI顶会模式出了问题? 「不发表,就出局」的恶性循环,正在压垮整个AI学界
机器之心· 2025-08-13 04:49
Core Viewpoint - The current model of AI academic conferences is deemed unsustainable due to overwhelming submission rates, environmental impacts, and mental health concerns among researchers [5][11][15]. Group 1: Challenges Facing AI Conferences - The average annual publication rate in the AI field has exceeded 4.5 papers per author, doubling in the past decade, leading to a focus on quantity over quality [7][22]. - The travel emissions from NeurIPS 2024 alone exceeded 8,254 tons of CO2 equivalent, surpassing the daily emissions of Vancouver, highlighting the environmental cost of these conferences [23][25]. - Over 71% of discussions on Reddit regarding AI conferences expressed negative sentiments, with 35% mentioning mental health issues such as anxiety and burnout [28][29]. Group 2: Proposed Solutions - The Community-Federated Conference (CFC) model is proposed as a sustainable and equitable alternative, separating traditional conference functions into three interconnected layers: global peer review, regional centers for knowledge dissemination, and a unified digital platform for collaboration [38][40][41]. - The first layer involves a centralized digital platform for peer review and publication, allowing for rolling submissions independent of physical conferences [39]. - The second layer consists of regional centers that facilitate local presentations, reducing the need for large venues and minimizing carbon footprints [40]. Group 3: Future Directions - The CFC model aims to address the structural issues of traditional conferences by promoting local engagement and reducing the pressure on authors while maintaining academic rigor [38][41]. - The shift towards a decentralized approach is seen as essential to foster collaboration and inclusivity within the AI research community [39][40].
辛顿教授世界人工智能大会演讲PPT
2025-07-29 02:10
Summary of Key Points from the Conference Call Industry or Company Involved - The discussion revolves around the field of Artificial Intelligence (AI), particularly focusing on Digital Intelligence versus Biological Intelligence. Core Points and Arguments 1. **Two Paradigms of Intelligence** - The essence of intelligence is reasoning, achieved through symbolic rules manipulating symbolic expressions. Learning can be secondary to understanding knowledge representation [7][8][9]. 2. **Evolution of Language Models** - Over the past 30 years, significant advancements have occurred in language modeling, including the introduction of embedding vectors and the invention of transformers by Google [13][14]. 3. **Understanding of Language by LLMs** - Large Language Models (LLMs) understand language similarly to humans by converting words into compatible feature vectors, indicating a level of comprehension in their responses [16][28]. 4. **Analogy of Words as Lego Blocks** - Words are compared to high-dimensional Lego blocks, which can model various concepts and communicate ideas effectively [20][24]. 5. **Digital vs. Biological Computation** - Digital computation, while energy-intensive, allows for easy knowledge sharing among agents with the same model. In contrast, biological computation is less energy-consuming but struggles with knowledge transfer [51]. 6. **Knowledge Transfer Mechanisms** - Knowledge can be distilled from a teacher to a student in AI systems, allowing for efficient learning and adaptation [41][48]. 7. **Challenges of AI Control** - A super-intelligence could manipulate users to gain power, raising concerns about control and safety in AI development [55][57]. 8. **Global Cooperation on AI Safety** - There is skepticism about international collaboration on AI safety measures against threats like cyber attacks and autonomous weapons [64]. 9. **Training Benevolent AI** - Techniques to train AI to be benevolent may be independent of those that enhance its intelligence, suggesting a need for focused research on AI safety [68][72]. Other Important but Possibly Overlooked Content - The discussion emphasizes the potential risks associated with AI development, likening the situation to owning a tiger cub that could become dangerous as it matures, highlighting the urgency for safety measures [61]. - The need for countries to establish well-funded AI safety institutes to focus on making AI systems that do not seek control is also noted [72].
自动驾驶基础模型全面盘点(LLM/VLM/MLLM/扩散模型/世界模型)
自动驾驶之心· 2025-06-21 11:18
Core Insights - The article discusses the critical role of foundation models in generating and analyzing complex driving scenarios for autonomous vehicles, emphasizing their ability to synthesize diverse and realistic high-risk safety scenarios [2][4]. Group 1: Foundation Models in Autonomous Driving - Foundation models enable the processing of heterogeneous inputs such as natural language, sensor data, and high-definition maps, facilitating the generation and analysis of complex driving scenarios [2]. - A unified classification system is proposed, covering various model types including Large Language Models (LLMs), Vision-Language Models (VLMs), Multimodal Large Language Models (MLLMs), Diffusion Models (DMs), and World Models (WMs) [2][4]. Group 2: Methodologies and Tools - The article reviews methodologies, open-source datasets, simulation platforms, and benchmark testing challenges relevant to scenario generation and analysis [2]. - Specific evaluation metrics for assessing scenario generation and analysis are discussed, highlighting the need for dedicated assessment standards in this field [2]. Group 3: Current Challenges and Future Directions - The article identifies open challenges and research questions in the field of scenario generation and analysis, suggesting areas for future research and development [2].
北大、清华、UvA、CMU等联合发布:大模型逻辑推理能力最新综述
机器之心· 2025-05-07 07:37
Core Viewpoint - Current research on large language models (LLMs) is shifting from pre-training based on scaling laws to post-training focused on enhancing reasoning capabilities, particularly logical reasoning, which is crucial for addressing hallucination issues [1][4]. Group 1: Logical Reasoning Challenges - LLMs exhibit significant deficiencies in logical reasoning, categorized into two main issues: logical question answering and logical consistency [4][9]. - In logical question answering, LLMs struggle to generate correct answers when required to perform complex reasoning based on given premises and constraints [6][10]. - Logical consistency issues arise when LLMs provide contradictory answers to different questions, undermining their reliability in high-stakes applications [11][20]. Group 2: Research Methodologies - The review categorizes existing methods for enhancing logical reasoning into three main approaches: external solvers, prompt engineering, and pre-training with fine-tuning [15][18]. - External solver methods involve translating natural language logic problems into symbolic language expressions for resolution by external solvers [16]. - Prompt engineering focuses on designing prompts that guide LLMs to construct logical reasoning chains explicitly [17]. - Pre-training and fine-tuning methods aim to incorporate high-quality logical reasoning examples into the training datasets to improve model performance [18]. Group 3: Logical Consistency Types - Various forms of logical consistency are identified, including negation consistency, implication consistency, transitivity consistency, fact consistency, and compositional consistency [22][24][26][28]. - Each type of consistency has specific requirements, such as ensuring that contradictory statements cannot both be true (negation consistency) or that logical implications are maintained (implication consistency) [22][24]. - The review emphasizes the importance of developing methods to enhance logical consistency across multiple dimensions to improve LLM reliability [28][31]. Group 4: Future Research Directions - Future research should explore extending LLMs' reasoning capabilities to modal logic to handle uncertainty and developing efficient algorithms that satisfy multiple forms of logical consistency [30][31]. - There is a need for training LLMs in higher-order logic to address more complex reasoning challenges [31]. Conclusion - The comprehensive survey outlines the current state of research on LLMs' logical reasoning capabilities, highlighting significant challenges and proposing future research directions to enhance their performance in logical question answering and consistency [32].
谷歌DeepMind:大模型也很任性,知道最优路径偏要撞南墙
机器之心· 2025-05-05 03:40
Core Insights - The article investigates the common failure modes of Large Language Models (LLMs) in decision-making scenarios, specifically focusing on greediness, frequency bias, and the knowing-doing gap [2][15]. - It proposes a reinforcement learning fine-tuning method (RLFT) to enhance the decision-making capabilities of LLMs by addressing these shortcomings [2][8]. Group 1: Failure Modes - LLMs exhibit suboptimal exploration and a knowing-doing gap, which prevents effective translation of knowledge into action [2][15]. - The three identified failure modes are: 1. Greediness, where LLMs overly favor actions that have previously shown the best performance [15]. 2. Frequency bias, where LLMs tend to repeat high-frequency actions regardless of their reward differences [5][18]. 3. Knowing-doing gap, where LLMs understand task requirements but fail to execute optimal actions due to a preference for greedy choices [7][20]. Group 2: Model Performance - Small-scale LLMs (2B) are significantly affected by frequency bias, leading to a lack of exploration, with up to 55% of actions remaining unexplored [4][18]. - Large-scale LLMs (27B) show reduced frequency bias but still exhibit greedy behavior, limiting their overall performance [6][18]. - The average action coverage for the largest models was only 45%, indicating a substantial gap compared to optimal strategies [17]. Group 3: Reinforcement Learning Fine-Tuning - The RLFT method adjusts the reasoning process of LLMs based on rewards obtained from environmental interactions, promoting the selection of actions that yield higher rewards [8][22]. - Results indicate that RLFT significantly reduces regret values in various environments, improving LLM performance compared to random baselines [22]. - RLFT effectively mitigates greediness by encouraging exploration, thus enhancing decision-making capabilities [22].
大模型驱动空间智能综述:具身智能体、智慧城市与地球科学的进展
" 欧米伽未来研究所 " 关注科技未来发展趋势,研究人类向欧米伽点演化过程中面临的重大机遇与挑战。将不定期推荐和发布世界范围重要科技研究进展和未 来趋势研究。( 点击这里查看欧米伽理论 ) 我们生活在一个由空间构成的世界中。从每天在家居、办公环境或城市街道中的移动,到规划一次跨越山海的旅行,乃至科学家们研究气候变迁的地理模 式、城市扩张的复杂格局,这一切都深刻地依赖于我们对空间的感知、理解和运用能力。这种核心能力,我们称之为"空间智能"。 长久以来,人类凭借自身的感官系统和发达的大脑,不断地探索、适应并改造着周遭的空间环境,演化出了独特的空间认知机制。而今,随着人工智能 (AI)技术的日新月异,特别是大语言模型(LLMs)的横空出世,机器也开始显露出令人瞩目的空间智能潜力。这场由大模型引领的技术浪潮,正以前 所未有的深度和广度,渗透到从微观尺度的机器人导航,到中观尺度的城市规划管理,再到宏观尺度的地球科学研究等诸多领域。 这部报告由清华大学和芬兰赫尔辛基大学共同发布,将带领读者一同深入探究,大模型是如何被赋予"空间感"的?它们在跨越不同尺度的空间智能任务中 扮演着怎样日益重要的角色?以及在迈向更高级空间智能的 ...