大语言模型

Search documents
分层VLA模型与完全端到端VLA哪个方向好发论文?
自动驾驶之心· 2025-07-23 07:32
Core Viewpoint - The article emphasizes the shift in academic research from traditional perception and planning tasks in autonomous driving to the exploration of Vision-Language-Action (VLA) models, suggesting that there are still many opportunities for research in this area [1][2]. Group 1: VLA Research Topics - The VLA model represents a new paradigm in autonomous driving, integrating vision, language, and action to enhance decision-making capabilities [2][3]. - The evolution of autonomous driving technology can be categorized into three phases: traditional modular architecture, pure visual end-to-end systems, and the emergence of VLA models [2][3]. - VLA models aim to improve interpretability and reliability by allowing the model to explain its decisions in natural language, thus increasing transparency and trust [3]. Group 2: Course Objectives and Structure - The course aims to help participants systematically master key theoretical knowledge in VLA and develop practical skills in model design and implementation [6][7]. - Participants will engage in a 12-week online group research followed by 2 weeks of paper guidance, culminating in a 10-week maintenance period for their research papers [6]. - The course will provide insights into classic and cutting-edge papers, coding implementations, and writing methodologies, ultimately assisting participants in producing a research paper draft [6][12]. Group 3: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and basic programming skills [5][9]. - Participants are expected to have access to high-performance computing resources, ideally with multiple high-end GPUs, to facilitate their research [13][14]. - A preliminary assessment will be conducted to tailor the course content to the individual needs of participants, ensuring a focused learning experience [15]. Group 4: Course Highlights and Outcomes - The course features a "2+1" teaching model, providing comprehensive support from experienced instructors and research mentors [15]. - Participants will gain a thorough understanding of the research process, writing techniques, and submission strategies, enhancing their academic and professional profiles [15][20]. - The expected outcomes include a research paper draft, project completion certificates, and potential recommendation letters based on performance [15].
ICML2025|清华医工平台提出大模型「全周期」医学能力评测框架MultiCogEval
机器之心· 2025-07-23 01:04
Core Viewpoint - The rapid development of Large Language Models (LLMs) is significantly reshaping the healthcare industry, with these models becoming a new battleground for advanced technology [2][3]. Group 1: Medical Language Models and Their Capabilities - LLMs possess strong text understanding and generation capabilities, enabling them to read medical literature, interpret medical records, and even generate preliminary diagnostic suggestions based on patient statements, thereby assisting doctors in improving diagnostic accuracy and efficiency [2][3]. - Despite achieving over 90% accuracy on medical question-answering benchmarks like MedQA, the practical application of these models in real clinical settings remains suboptimal, indicating a "high score but low capability" issue [4][5]. Group 2: MultiCogEval Framework - The MultiCogEval framework was introduced to evaluate LLMs across different cognitive levels, addressing the gap between medical knowledge mastery and clinical problem-solving capabilities [5][6][10]. - This framework assesses LLMs' clinical abilities at three cognitive levels: basic knowledge mastery, comprehensive knowledge application, and scenario-based problem-solving [12][14]. Group 3: Evaluation Results - Evaluation results show that while LLMs perform well in low-level tasks (basic knowledge mastery) with accuracy exceeding 60%, their performance declines significantly in mid-level tasks (approximately 20% drop) and further deteriorates in high-level tasks, with the best model achieving only 19.4% accuracy in full-chain diagnosis [16][17]. - The study found that fine-tuning in the medical domain effectively enhances LLMs' low and mid-level clinical capabilities, with improvements up to 15%, but has limited impact on high-level task performance [19][22]. Group 4: Future Implications - The introduction of the MultiCogEval framework lays a solid foundation for future research and development of medical LLMs, aiming to promote more robust, reliable, and practical applications of AI in healthcare, ultimately contributing to the creation of "trustworthy AI doctors" [21][22].
Kimi K2官方技术报告出炉:采用384个专家,训练不靠刷题靠“用自己的话再讲一遍”
量子位· 2025-07-22 06:39
Core Viewpoint - Kimi K2 has emerged as a leading open-source model, showcasing significant advancements in capabilities, particularly in code, agent tasks, and mathematical reasoning [4][5]. Group 1: Technical Highlights - Kimi K2 features a total parameter count of 1 trillion and 32 billion active parameters, demonstrating its advanced capabilities [4]. - The model has achieved state-of-the-art (SOTA) performance in various benchmark tests, including SWE Bench Verified, Tau2, and AceBench [12]. - The Kimi team emphasizes a shift from static imitation learning to Agentic Intelligence, requiring models to autonomously perceive, plan, reason, and act in complex environments [9][10]. Group 2: Core Innovations - Three core innovations are implemented in Kimi K2: 1. MuonClip optimizer, which replaces traditional Adam optimizer, allowing for lossless spike pre-training on 15.5 trillion tokens [11]. 2. Large-scale Agentic Tool Use data synthesis, enabling the generation of multi-turn tool usage scenarios across hundreds of domains and thousands of tools [12]. 3. A universal reinforcement learning framework that extends alignment from static to open domains [12]. Group 3: Pre-training and Post-training Phases - During the pre-training phase, Kimi K2 optimizes both the optimizer and data, utilizing the MuonClip optimizer to enhance training stability and efficiency [21][22]. - The training data covers four main areas: web content, code, mathematics, and knowledge, all subjected to strict quality screening [24]. - The post-training phase involves supervised fine-tuning and reinforcement learning, with a focus on generating high-quality training data through a rejection sampling mechanism [30][31]. Group 4: Reinforcement Learning Process - The reinforcement learning process includes creating verifiable reward environments for objective evaluation of model performance [33]. - A self-critique reward mechanism is introduced, allowing the model to evaluate its outputs based on predefined standards [34]. - The model generates diverse agentic tasks and tool combinations, ensuring a comprehensive training approach [35]. Group 5: Infrastructure and Performance - Kimi K2's training relies on a large-scale high-bandwidth GPU cluster composed of NVIDIA H800, ensuring efficient training across various resource scales [38]. - Each node is equipped with 2TB of memory, facilitating high-speed interconnectivity among GPUs [39].
从2025意大利国际近红外光谱学术会议看技术发展新趋势
仪器信息网· 2025-07-22 03:24
Core Viewpoint - The article discusses the advancements in Near Infrared Spectroscopy (NIRS) technology, highlighting innovations in hardware, data processing methods, and diverse applications across various industries, indicating a trend towards more intelligent and accessible analytical tools for precision agriculture, green industry, and personalized medicine [1]. Group 1: Innovations in Hardware and Portable Applications - The development of miniaturized, intelligent, and cost-effective NIRS devices has expanded field detection applications, with a focus on balancing portability and performance [3][4]. - Notable examples include a handheld NIRS device developed by an Australian company that integrates MEMS/InGaAs sensor modules, significantly reducing costs while maintaining sensitivity and resolution [3]. - Practical applications of portable devices include food safety assessments, drug testing, and quality control in coffee production, demonstrating their effectiveness in real-world scenarios [5]. Group 2: Integration with Cloud Computing and IoT - The integration of portable NIRS with RFID, blockchain, and IoT has enabled the creation of comprehensive traceability systems, enhancing the digital supply chain [6]. - A New Zealand company successfully replaced 40 online and offline spectrometers with a standardized NIR network, ensuring data consistency throughout the production chain [6]. Group 3: Development of Specialized Spectrometers - Innovations in specialized spectrometers, such as the MiniSmartSensor developed by SINTEF in Norway, allow for precise subsurface detection in food quality analysis [7]. Group 4: Advances in Data Processing and Model Building - The conference highlighted the shift from traditional PLS regression to more adaptive modeling strategies, improving robustness and interpretability in complex sample analysis [9]. - New methodologies, such as the "first principles" approach and data augmentation techniques, have been introduced to enhance model performance and address small sample calibration challenges [9][10]. Group 5: Expansion of Application Scenarios - NIRS technology is increasingly applied across diverse fields, including bioenergy optimization, agricultural quality assessment, and industrial applications, showcasing its cross-industry penetration [18][19]. - Noteworthy applications include real-time monitoring of biogas production and non-destructive quality assessment of organic oranges, demonstrating the versatility of NIRS [18]. Group 6: Automation and Intelligent Applications - The introduction of automation technologies has significantly improved the efficiency of NIRS applications, transitioning from laboratory settings to field and industrial environments [21]. - Examples include collaborative robots for automated wood sample processing and drone systems for real-time vineyard monitoring [23][24]. Group 7: Environmental and Medical Innovations - NIRS technology is favored in environmental monitoring and healthcare due to its green characteristics, enabling efficient detection of microplastics and real-time dialysis monitoring [28][29]. Group 8: Multimodal Data Fusion and Future Prospects - The integration of multimodal data fusion is a key development direction for NIRS, enhancing model accuracy and applicability [36]. - Future advancements are expected to focus on smaller, smarter sensors, the fusion of physical models with data-driven approaches, and the expansion of NIRS applications into complex scenarios [41][42].
梳理了1400篇研究论文,整理了一份全面的上下文工程指南 | Jinqiu Select
锦秋集· 2025-07-21 14:03
Core Insights - The article discusses the emerging field of Context Engineering, emphasizing the need for a systematic theoretical framework to complement practical experiences shared by Manus' team [1][2] - A comprehensive survey titled "A Survey of Context Engineering for Large Language Models" has been published, analyzing over 1400 research papers to establish a complete technical system for Context Engineering [1][2] Context Engineering Components - Context Engineering is built on three interrelated components: Information Retrieval and Generation, Information Processing, and Information Management, forming a complete framework for optimizing context in large models [2] - The first component, Context Retrieval and Generation, focuses on engineering methods to effectively acquire and construct context information for models, including practices like Prompt Engineering, external knowledge retrieval, and dynamic context assembly [2] Prompting Techniques - Prompting serves as the starting point for model interaction, where effective prompts can unlock deeper capabilities of the model [3] - Zero-shot prompting provides direct instructions relying on pre-trained knowledge, while few-shot prompting offers a few examples to guide the model in understanding task requirements [4] Advanced Reasoning Frameworks - For complex tasks, structured thinking is necessary, with Chain-of-Thought (CoT) prompting models to think step-by-step, significantly improving accuracy in complex tasks [5] - Tree-of-Thoughts (ToT) and Graph-of-Thoughts (GoT) further enhance reasoning by allowing exploration of multiple paths and dependencies, improving success rates in tasks requiring extensive exploration [5] Self-Refinement Mechanisms - Self-Refinement allows models to iteratively improve their outputs through self-feedback without requiring additional supervised training data [8][9] - Techniques like N-CRITICS and Agent-R enable models to evaluate and correct their reasoning paths in real-time, enhancing output quality [10][11] External Knowledge Retrieval - External knowledge retrieval, particularly through Retrieval-Augmented Generation (RAG), addresses the static nature of model knowledge by integrating dynamic information from external databases [12][13] - Advanced RAG architectures introduce adaptive retrieval mechanisms and hierarchical processing strategies to enhance information retrieval efficiency [14][15] Context Processing Challenges - Processing long contexts presents significant computational challenges due to the quadratic complexity of Transformer self-attention mechanisms [28] - Innovations like State Space Models and Linear Attention aim to reduce computational complexity, allowing models to handle longer sequences more efficiently [29][30] Context Management Strategies - Effective context management is crucial for organizing, storing, and utilizing information, addressing issues like context overflow and collapse [46][47] - Memory architectures inspired by operating systems and cognitive models are being developed to enhance the memory capabilities of language models [48][50] Tool-Integrated Reasoning - Tool-Integrated Reasoning transforms language models from passive text generators into active agents capable of interacting with the external world through function calling and integrated reasoning frameworks [91][92]
“AI教父”辛顿最新访谈:没有什么是AI不能复制的,人类正失去最后的独特性
3 6 Ke· 2025-07-21 08:19
7月21日消息,被誉为"AI教父"的图灵奖得主杰弗里·辛顿与AI初创公司Cohere联合创始人尼克·弗罗斯特近日进行了一场炉边对话。作为 辛顿在多伦多Google Brain实验室的首位员工,弗罗斯特如今已成为AI创业领域的领军人物。 在这场对话中,两位顶尖专家围绕AI领域的前沿议题展开深入探讨,包括:大语言模型是否真正理解人类语言?数字智能能否真正超越 生物智能?哪些领域将成为AI最具潜力的应用场景?科技巨头对监管的真实态度又是如何?此外,他们还重点讨论了AI技术带来的双重 危险,并就如何建立有效的安全防护体系进行了交流。 以下是辛顿与弗罗斯特的核心观点: 4.当前模型无法像人类一样根据经验持续学习,只能通过两个阶段(预训练+强化学习)静态获得知识。更新知识仍需重训底层模型 5.弗罗斯特与辛顿都认为,"语言即操作系统"的时代即将到来。只通过自然语言,用户就能调动办公系统执行复杂任务。 6.辛顿强调AI带来的双重风险:短期内可能用于操纵选举、制造武器;长期则可能因超越人类智慧而"接管世界"。 7.辛顿认为,大模型通过压缩连接数量、寻找知识间深层联系展现出真正的"创造力",甚至超越大多数人类。 8.辛顿认为5年 ...
面试了很多端到端候选人,还是有很多人搞不清楚。。。
自动驾驶之心· 2025-07-20 08:36
Core Viewpoint - End-to-End Autonomous Driving is a key algorithm for intelligent driving mass production, with significant salary potential for related positions, and it has evolved into various technical directions since the introduction of UniAD [2][4]. Group 1: Technical Directions - End-to-End Autonomous Driving can be categorized into one-stage and two-stage approaches, with various subfields emerging under each category [2][4]. - The core advantage of end-to-end systems is the direct modeling from sensor input to vehicle planning/control information, avoiding error accumulation seen in modular methods [2]. - Notable algorithms include PLUTO for two-stage end-to-end, UniAD for perception-based one-stage, OccWorld for world model-based one-stage, and DiffusionDrive for diffusion model-based one-stage [4]. Group 2: Industry Trends - The demand for VLA/VLM algorithm experts is increasing, with salary ranges for positions requiring 3-5 years of experience being between 40K-70K [9]. - The industry is witnessing a shift towards large model algorithms, with companies focusing on VLA as the next generation of autonomous driving solutions [8][9]. Group 3: Course Offerings - A new course titled "End-to-End and VLA Autonomous Driving" is being offered to help individuals understand the complexities of end-to-end algorithms and their applications [15][28]. - The course covers various topics, including background knowledge, two-stage end-to-end, one-stage end-to-end, and practical applications of reinforcement learning [20][22][24]. - The course aims to provide a comprehensive understanding of the end-to-end framework, including key technologies like BEV perception, multi-modal large models, and diffusion models [31].
大模型自信心崩塌!谷歌DeepMind证实:反对意见让GPT-4o轻易放弃正确答案
量子位· 2025-07-20 05:08
Core Viewpoint - The research conducted by Google DeepMind and University College London reveals that large language models (LLMs) exhibit conflicting behaviors of being both confident and self-doubting, influenced by their sensitivity to opposing feedback [2][3][21]. Group 1: Model Behavior - LLMs tend to maintain their initial answers when they can see them, reflecting a human-like tendency to uphold one's viewpoint after making a decision [11][12]. - Conversely, when the initial answer is hidden, LLMs are more likely to change their answers, indicating an excessive sensitivity to opposing suggestions, even if those suggestions are incorrect [13][21]. - This behavior diverges from human cognition, as humans typically do not easily abandon their correct conclusions based on misleading information [15][21]. Group 2: Experimental Design - The study involved a two-round experiment where LLMs were first presented with a binary choice question and then received feedback from a fictional suggestion LLM [7][8]. - Key variables included whether the initial answer was visible to the responding LLM, which significantly affected the final decision-making process [9][10]. Group 3: Reasons for Inconsistent Behavior - The inconsistency in LLM responses is attributed to several factors: - Over-reliance on external feedback due to reinforcement learning from human feedback (RLHF), leading to a lack of independent judgment regarding the reliability of information [19][21]. - Decision-making based on statistical pattern matching rather than logical reasoning, making LLMs susceptible to misleading signals [19][21]. - The absence of a robust memory mechanism that would allow for deeper reasoning, resulting in a tendency to be swayed by opposing suggestions when the initial answer is not visible [21][22].
百度集团-SW(09888):AI搜索改造下百度核心广告业务承压,萝卜快跑继续领跑Robotaxi行业
Soochow Securities international· 2025-07-18 14:00
Investment Rating - The report maintains a "Buy" rating for Baidu Group [2][7] Core Views - Baidu's core advertising business is expected to face pressure due to AI search transformations, with a projected revenue decline of 16.3% year-on-year in Q2 2025 [7] - Baidu's Robotaxi service, "Luobo Kuaipao," is leading the global market, with a significant increase in order volume, up 75% year-on-year to 1.44 million in Q1 2025 [7] - The company's intelligent cloud business is experiencing rapid growth driven by the demand for generative AI and large language models, with Q1 2025 cloud service revenue expected to grow by 42% year-on-year [7] - The overall revenue forecast for 2025-2027 has been adjusted to reflect a decline of 5.2% in 2025, followed by growth of 4.4% and 4.8% in 2026 and 2027, respectively [7] - The target price for Baidu has been revised down to HKD 95.15 based on DCF valuation [7] Financial Projections - Revenue projections for Baidu are as follows: - 2024: 133,125 million CNY - 2025: 126,265 million CNY - 2026: 131,853 million CNY - 2027: 138,172 million CNY [2][12] - Net profit projections are: - 2024: 23,760 million CNY - 2025: 18,324 million CNY - 2026: 20,200 million CNY - 2027: 22,172 million CNY [2][12] - The P/E ratio is projected to be 10.9 in 2024, increasing to 14.1 in 2025, and then decreasing to 11.7 by 2027 [2][12] Business Segments - The core online marketing service revenue is expected to decline by 15.3% in 2025, while cloud service revenue is projected to grow by 22.2% [8] - The iQIYI segment is expected to see a slight revenue decline of 1.0% in 2025 [8] Valuation Metrics - The report provides a DCF valuation breakdown, indicating a total enterprise value of approximately 370.45 billion CNY, with equity value at 287.44 billion CNY [9][10]
大历史中的超能力|荐书
腾讯研究院· 2025-07-18 08:18
Core Viewpoint - The article discusses the evolution of intelligence from early mammals to modern AI, emphasizing that intelligence can compensate for physical limitations and that historical events significantly influence the development of intelligence [3][4][11]. Group 1: Evolution of Intelligence - The first breakthrough in brain evolution occurred 550 million years ago, allowing organisms to differentiate between stimuli and develop basic emotional responses with only a few hundred neurons [4]. - The second breakthrough involved the advanced use of dopamine in vertebrates, enabling them to quantify the likelihood of rewards and develop curiosity through complex actions [5]. - The third breakthrough was the development of the neocortex in mammals, which allowed for imagination and planning, akin to slow thinking as described by Daniel Kahneman [5][6]. Group 2: AI and Intelligence - AI has significantly improved through reinforcement learning, which rewards processes rather than just outcomes, allowing for learning from each step rather than waiting for the end result [5]. - Current AI models, particularly large language models, demonstrate an understanding of language beyond mere memorization, indicating a significant advancement in AI capabilities [7][10]. - The potential future breakthroughs in AI may involve combining human and AI intelligence, enabling AI to simulate multiple worlds or understand complex rules in novel ways [11][12]. Group 3: Historical Context of Breakthroughs - Historical events, such as the asteroid impact that led to the extinction of dinosaurs, have provided opportunities for the evolution of mammals and the development of intelligence [3][15]. - The article suggests that significant changes in the world often arise from unexpected and radical shifts rather than gradual improvements [16][17].