大语言模型
Search documents
航空发动机用上大模型:解决复杂时序问题,性能超越ChatGPT-4o实现SOTA|上交创智复旦
量子位· 2025-06-28 04:42
ITFormer团队 投稿 量子位 | 公众号 QbitAI 时序数据分析在工业监控、医疗诊断等领域至关重要。 比如航空发动机监控这个复杂工业场景中,工程师需分析海量多通道传感器数据,以判断设备状态并制定维护决策。 然而,现有研究多聚焦于分类、预测等单一任务,与实际工业场景中专家通过自然语言进行复杂交互和决策的需求存在显著差异。 上海交通大学航空航天学院李元祥教授团队 、上海创智学院、复旦大学数据科学学院团队以航空发动机运维为背景,提出 高效、可迁移的时 序-语言桥接架构—— ITFormer ,将专家诊断过程抽象为"理解、感知、推理、决策"四个认知层次,并首次系统性地定义为"时序问答"任务 范式。 团队 基于NASA航空发动机数据,构建了包含11万余问答对的EngineMT-QA数据集。 该数据集的任务设计紧密贴合专家的认知流程,为评 估模型在真实工业场景下的推理能力提供了首个标准化基准。 结果显示,ITFormer以模块化设计实现了时序数据与大语言模型的高效融合,仅需训练不足1%的额外参数,便可在通用时序问答数据集上表 现出优越的性能和良好的迁移能力,展现了卓越的"即插即用"特性。 它可无缝适配Patch ...
DeepSeek-R2为什么还没发?
量子位· 2025-06-27 08:09
Core Viewpoint - The release of DeepSeek-R2 has been delayed due to CEO Liang Wenfeng's dissatisfaction with its performance and a shortage of Nvidia H20 chips, which are critical for its development [1][2][4]. Development Timeline - The anticipation for R2 began after the release of the DeepSeek-V3 model in December last year, which was considered a benchmark for cost-performance [5]. - An upgrade to V3 was announced in March 2023, leading to speculation that R2 would be released in April [11]. - Despite the release of a paper on scaling laws in early April, there has been no official update on R2 since then [12][16]. Technical Specifications - R1's training utilized 30,000 H20 chips, 10,000 H800 chips, and 10,000 H100 chips, indicating the significant computational resources required for R2 [3]. - Leaked parameters for R2 suggested it would have 1.2 trillion parameters and utilize 5.2 petabytes of training data, although the authenticity of these claims remains uncertain [17]. Community Reactions - Following the news of the delay, community responses varied, with some expressing belief that the delay is justified, while others speculated that R2 might wait for the release of V4 [26][30].
今年大火的目标导航到底是什么?从目标搜索到触达有哪些路线?
具身智能之心· 2025-06-26 14:19
Core Viewpoint - Goal-Oriented Navigation empowers robots to autonomously complete navigation tasks based on goal descriptions, marking a significant shift from traditional visual language navigation systems [2][3]. Group 1: Technology Overview - Embodied navigation is a core area of embodied intelligence, relying on three technical pillars: language understanding, environmental perception, and path planning [2]. - Goal-Oriented Navigation requires robots to explore and plan paths in unfamiliar 3D environments using only goal descriptions such as coordinates, images, or natural language [2]. - The technology has been industrialized in various verticals, including delivery, healthcare, and hospitality, enhancing service efficiency [3]. Group 2: Technological Evolution - The evolution of Goal-Oriented Navigation can be categorized into three generations: - First Generation: End-to-end methods focusing on reinforcement learning and imitation learning, achieving breakthroughs in Point Navigation and closed-set image navigation tasks [5]. - Second Generation: Modular methods that explicitly construct semantic maps, breaking tasks into exploration and goal localization [5]. - Third Generation: Integration of large language models (LLMs) and visual language models (VLMs) to enhance knowledge reasoning and open vocabulary target matching [7]. Group 3: Challenges and Learning Path - The complexity of embodied navigation, particularly Goal-Oriented Navigation, necessitates knowledge from multiple fields, making it challenging for newcomers to enter the domain [9]. - A new course has been developed to address these challenges, focusing on quick entry, building a research framework, and combining theory with practice [10][11][12]. Group 4: Course Structure - The course will cover the theoretical foundations and technical lineage of Goal-Oriented Navigation, including task definitions and evaluation benchmarks [15]. - It will also delve into the Habitat simulation ecosystem, end-to-end navigation methodologies, modular navigation architectures, and LLM/VLM-driven navigation systems [16][18][20][22]. - A significant project will focus on the reproduction of VLFM algorithms and their deployment in real-world scenarios [24].
张亚勤:未来电车品牌可能出现整合,2030年将有10%新车具备 L4 级自动驾驶能力
Sou Hu Cai Jing· 2025-06-26 10:04
Group 1 - The 16th Davos Forum (New Champions Annual Meeting) will be held in Tianjin from June 24 to 26, 2025, with Zhang Yaqin, a foreign academician of the Chinese Academy of Engineering and the director of Tsinghua University's Intelligent Industry Research Institute, attending and sharing insights [2] - Zhang Yaqin predicts that the autonomous driving sector is approaching a "DeepSeek moment," highlighting significant advancements in robotaxi technology, which has seen substantial progress and commercialization in cities like San Francisco, Los Angeles, Austin, Tokyo, and Tesla's location [2] - In China, Baidu's Apollo Go system has been operational for the longest time, successfully covering Wuhan with over 1,000 vehicles, indicating a new phase in the industry, alongside efforts from companies like WeRide [2] Group 2 - Zhang Yaqin emphasizes two core goals for safety and economic efficiency: significantly improving safety to be ten times safer than human drivers, aiming to reduce 90% of accidents caused by human error, and transforming vehicle economics by eliminating driver costs, potentially doubling operational efficiency [3] - The development of generative AI and large language models is helping to address two major challenges in autonomous driving: processing and understanding vast amounts of data, and enabling end-to-end training of decision models, which simplifies the system while maintaining safety boundaries [3] - Zhang Yaqin forecasts that by 2030, 10% of new vehicle shipments will possess L4 autonomous driving capabilities, catering to both robotaxi and consumer markets, while also noting the need for improved charging infrastructure and competitive regulations in the electric vehicle ecosystem [4]
如何做到在手机上实时跑3D真人数字人?MNN-TaoAvatar开源了!
机器之心· 2025-06-25 00:46
Core Viewpoint - TaoAvatar is a breakthrough 3D digital human technology developed by Alibaba's Taobao Meta Technology team, enabling real-time rendering and AI dialogue on mobile and XR devices, providing users with a realistic virtual interaction experience [1][8]. Group 1: Technology Overview - TaoAvatar utilizes advanced 3D Gaussian splatting technology to create lifelike full-body avatars that capture intricate facial expressions and gestures, as well as details like clothing folds and hair movement [8]. - The technology significantly reduces the cost and increases the efficiency of digital human modeling, facilitating large-scale applications [9]. - MNN-TaoAvatar is an open-source 3D digital human application that integrates multiple leading AI technologies, allowing natural voice interaction with digital humans on mobile devices [10]. Group 2: Performance Metrics - The application runs efficiently on mobile devices, with key performance metrics for various models as follows: - ASR (Automatic Speech Recognition): Model size 281.65M, RTF: 0.18 - LLM (Large Language Model): Model size 838.74M, pre-fill speed: 165 tokens/s, decode speed: 41.16 tokens/s - TTS (Text-to-Speech): Model size 1.34GB, RTF: 0.58 - A2BS (Audio-to-BlendShape): Model size 368.71MB, RTF: 0.34 - NINIR (Rendering Output): Model size 138.40MB, rendering frame rate: 60 FPS [16][17][18]. Group 3: Development and Optimization - MNN-TaoAvatar is built on the MNN engine, which supports various algorithm modules, enhancing the performance of AI applications in real-time scenarios [23][30]. - The MNN-LLM module demonstrates superior CPU performance, with pre-fill speed improved by 8.6 times compared to llama.cpp and decoding speed improved by 2.3 times [34]. - The MNN-NNR rendering engine employs optimizations such as data synchronization and scheduling to ensure efficient rendering, achieving smooth output at 60 FPS even with lower frequency updates [40][45]. Group 4: Hardware Requirements - Recommended hardware for MNN-TaoAvatar includes devices with Qualcomm Snapdragon 8 Gen 3 or equivalent CPU, at least 8GB of RAM, and 5GB of storage for model files [51].
具身领域的目标导航到底是什么?从目标搜索到触达有哪些路线?
具身智能之心· 2025-06-24 14:09
Core Insights - Goal-Oriented Navigation empowers robots to autonomously complete navigation tasks based on goal descriptions, marking a significant shift from traditional visual language navigation [2] - The technology has been successfully implemented in various verticals, enhancing service efficiency in delivery, healthcare, and hospitality sectors [3] - The evolution of Goal-Oriented Navigation can be categorized into three generations, each with distinct methodologies and advancements [5][7] Group 1: Technology Overview - Goal-Oriented Navigation is a key aspect of embodied navigation, relying on language understanding, environmental perception, and path planning [2] - The transition from explicit instructions to autonomous decision-making involves semantic parsing, environmental modeling, and dynamic decision-making [2] - The technology has been integrated into delivery robots, service robots in healthcare and hospitality, and humanoid robots for domestic and industrial applications [3] Group 2: Technical Evolution - The first generation focuses on end-to-end methods using reinforcement and imitation learning, achieving breakthroughs in Point Navigation and closed-set image navigation tasks [5] - The second generation employs modular methods that explicitly construct semantic maps, enhancing performance in zero-shot object navigation tasks [5] - The third generation integrates large language models (LLMs) and visual language models (VLMs) to improve exploration strategies and open-vocabulary target matching accuracy [7][8] Group 3: Challenges and Learning Path - The complexity of embodied navigation requires knowledge across multiple domains, making it challenging for newcomers to grasp the necessary concepts [10] - A new course has been developed to address these challenges, focusing on practical applications and theoretical foundations of Goal-Oriented Navigation [11][12][13] - The course aims to build a comprehensive understanding of the technology stack, including end-to-end reinforcement learning, modular semantic map construction, and LLM/VLM integration methods [30]
一文读懂美国AI之战--“科技五巨头”与“AI三小龙”的战争
硬AI· 2025-06-24 12:28
Core Viewpoint - The article highlights the intense competition in the AI arms race among traditional tech giants and emerging AI companies, with Meta's aggressive talent acquisition reflecting the urgency of the situation [1][2]. Group 1: Apple - Apple has faced significant setbacks in its AI initiatives, particularly with the Apple Intelligence project, and while it maintains hardware advantages, it needs deeper AI collaborations [4][5]. - The company’s core business remains unaffected by AI threats, as AI applications still rely on Apple devices for access [4]. - Apple should focus on building the best hardware for the AI era and invest in robotics and home automation to maintain its competitive edge [5]. Group 2: Google - Google has a leading position in AI infrastructure, with its Gemini model excelling in media creation, but its core search business faces disruptive threats from conversational AI [6][7]. - The company benefits from vast data resources and distribution channels, particularly through its Android system, which could challenge Apple's dominance in the high-end market [7]. - Google is working to transform AI from a disruptive technology into an enhancement tool for its search capabilities [7]. Group 3: Meta - Meta's strategic positioning is solid, focusing on personalized content and generative advertising, but it faces execution challenges and risks from attention resource competition [8]. - The urgency of Meta's talent recruitment indicates a recognition of significant threats to its core business from AI developments [8]. Group 4: Microsoft - Microsoft remains in a strong position but faces new challenges due to increasing tensions with OpenAI regarding profit-sharing and future collaborations [9][10]. - The company should prioritize maintaining its exclusive access to OpenAI's API through Azure while exploring partnerships with other model providers [10]. Group 5: Amazon - Amazon's outlook has improved, as AI is expected to benefit its business rather than disrupt it, particularly through AWS and product recommendations on Amazon.com [11][12]. - The partnership with Anthropic appears more stable compared to Microsoft's relationship with OpenAI, providing Amazon with a strategic advantage [12]. Group 6: Emerging AI Companies - OpenAI has established dominance in consumer AI, but faces conflicts with companies like Microsoft and Apple over customer relationships [13][14]. - Anthropic has built a strong position among developers, focusing on API revenue streams and maintaining a stable partnership with AWS [14]. - xAI is struggling with its infrastructure strategy and should seek investments to enhance its market position [15].
夏季达沃斯论坛解读发展中国家发展之道
Zhong Guo Xin Wen Wang· 2025-06-24 12:08
Group 1 - The core viewpoint of the article emphasizes that industrialization and technological innovation are crucial for the development of emerging and developing economies [1][2] - According to the International Monetary Fund, in 2023, the GDP of emerging and developing economies accounted for 58.9% of the global economy [1] - Experts at the forum highlighted that many developing countries face challenges in implementing industrialization despite having the necessary strategies, particularly in the context of globalization [1] Group 2 - The Chairman of the Hong Kong Stock Exchange stated that the future of industrialization is focused on innovative technologies, with significant investments made by China in this area [2] - An example was provided regarding the development of large language models, indicating that substantial funding is not a prerequisite for success, as demonstrated by China's DeepSeek model [2] - The Chairman of Angola's Unitel emphasized the importance of diversifying technology sources and sending students and workers abroad to better utilize foreign technologies [2]
赞同科技携金融科技成果亮相2025中国国际金融展
Sou Hu Cai Jing· 2025-06-24 09:20
Core Insights - The 2025 China International Financial Expo successfully took place in Shanghai, focusing on "Open Innovation, Technology Empowerment, and Co-creating a New Future for Finance" with over 400 financial institutions, technology companies, and industry organizations participating [1] Group 1: Company Innovations - Zandong Technology showcased a multi-purpose business light terminal solution driven by advanced large language models, which allows seamless switching between tablet and terminal modes, revolutionizing traditional service models in financial institutions [1] - The introduction of the Zandong Intelligent Banking Solution enhances operational efficiency and intelligence by allowing various business processes to be autonomously connected and assisted by large language models, positioning AI as a "co-pilot" for professionals [1] - Zandong Technology also presented a mobile banking product based on HarmonyOS 5.0, integrating native capabilities for a personalized and intuitive user experience, enabling users to complete business transactions efficiently through simple verbal requests [3] Group 2: Industry Reception and Future Plans - The innovative achievements of Zandong Technology received high praise from industry peers, with many attendees expressing strong interest in the products and experiencing the convenience and efficiency they offer [4] - Zandong Technology aims to continue leading industry trends and collaborate with more partners to advance the fintech sector, focusing on providing safer, more convenient, and intelligent financial services [4] - The company is committed to its development philosophy of "Persistence, Innovation, Trust, and Respect," planning to launch more financial technology products with independent intellectual property rights and core competitiveness [4]
突发!字节Seed大语言模型负责人被开除损失数千万
是说芯语· 2025-06-24 02:05
Core Insights - ByteDance recently disclosed a serious violation involving senior members of the Seed team, resulting in the dismissal of the head of the Seed large language model, Qiao Mu [1] - The violation involved an inappropriate personal relationship between Qiao Mu and an HRBP, which breached the company's conflict of interest policy [1] - Qiao Mu's total earnings at ByteDance over 11 years are estimated to exceed 500 million RMB, with significant income from stock options [2] Group 1 - The violation included failure to declare a personal relationship that violated company policy regarding conflicts of interest [1] - Qiao Mu and the HRBP provided false statements during the investigation, leading to severe disciplinary actions including termination and forfeiture of year-end bonuses [1] - Qiao Mu's estimated annual salary is over 10 million RMB, based on industry comparisons [1][2] Group 2 - The company's stock options have significantly appreciated, with the repurchase price rising from approximately 5 USD per share in 2014 to 189 USD, a 38-fold increase [2] - If Qiao Mu's compensation included 1 million RMB in cash and 1 million RMB in options, the value of the options would have surged to about 39 million RMB today [2] - The Seed team has recently released the Seed1.5-VL model, which demonstrates advanced multimodal understanding and reasoning capabilities [3]