Workflow
大语言模型
icon
Search documents
港股AGI第一股,云知声今日IPO
3 6 Ke· 2025-06-30 02:07
Core Viewpoint - Yunzhisheng, a leading domestic AI company, has successfully listed on the Hong Kong Stock Exchange, raising 206 million HKD at an issue price of 205 HKD per share, after 13 years of operation and 10 rounds of financing totaling over 2 billion RMB [1][2]. Group 1: Financial Performance - Yunzhisheng's revenue for 2022, 2023, and 2024 is projected to be 601 million RMB, 727 million RMB, and 939 million RMB, respectively, while corresponding losses are expected to be 375 million RMB, 376 million RMB, and 454 million RMB, totaling nearly 1.2 billion RMB in losses over three years [2]. - The company has seen a significant increase in its revenue from the "Smart Life" segment, which is expected to grow by 27.8% in 2024, contributing nearly 80% of total revenue [3][4]. Group 2: Business Segments - The "Smart Life" segment includes personalized solutions and AI capability APIs, with the former being the primary revenue source, covering over 700 types of home appliances and achieving a 70% market share in the voice interaction market for white goods [5]. - In the "Smart Transportation" sector, Yunzhisheng has implemented a voice ticketing system for Shenzhen Metro Line 20, reducing ticket purchase time from 15 seconds to 1.5 seconds, serving over 30,000 users daily [6]. Group 3: Market Position and Competition - In the medical AI market, Yunzhisheng ranks fourth with a market share of 2.1%, achieving revenues of 113 million RMB, 148 million RMB, and 199 million RMB from 2022 to 2024, with a compound annual growth rate of 36.6% [7][8]. - The company faces challenges in the medical sector due to product homogeneity and a lack of significant breakthroughs, as its AI solutions for medical record input and quality control are becoming increasingly saturated [9][11]. Group 4: Future Outlook - Yunzhisheng's strategy is to focus on the "Smart Life" segment, as the medical sector is unlikely to provide substantial short-term relief from losses, while the potential for growth in smart home and vehicle voice applications remains promising [15].
盘一盘,2017年Transformer之后,LLM领域的重要论文
机器之心· 2025-06-29 04:23
Core Insights - The article discusses Andrej Karpathy's concept of "Software 3.0," where natural language becomes the new programming interface, and AI models execute specific tasks [1][2]. - It emphasizes the transformative impact of this shift on developers, users, and software design paradigms, indicating a new computational framework is being constructed [2]. Development of LLMs - The evolution of Large Language Models (LLMs) has accelerated since the introduction of the Transformer architecture in 2017, leading to significant advancements in the GPT series and multimodal capabilities [3][5]. - Key foundational papers that established today's AI capabilities are reviewed, highlighting the transition from traditional programming to natural language interaction [5][6]. Foundational Theories - The paper "Attention Is All You Need" (2017) introduced the Transformer architecture, which relies solely on self-attention mechanisms, revolutionizing natural language processing and computer vision [10][11]. - "Language Models are Few-Shot Learners" (2020) demonstrated the capabilities of GPT-3, establishing the "large model + large data" scaling law as a pathway to more general artificial intelligence [13][18]. - "Deep Reinforcement Learning from Human Preferences" (2017) laid the groundwork for reinforcement learning from human feedback (RLHF), crucial for aligning AI outputs with human values [15][18]. Milestone Breakthroughs - The "GPT-4 Technical Report" (2023) details a large-scale, multimodal language model that exhibits human-level performance across various benchmarks, emphasizing the importance of AI safety and alignment [26][27]. - The release of LLaMA models (2023) demonstrated that smaller models trained on extensive datasets could outperform larger models, promoting a new approach to model efficiency [27][30]. Emerging Techniques - The "Chain-of-Thought Prompting" technique enhances reasoning in LLMs by guiding them to articulate their thought processes before arriving at conclusions [32][33]. - "Direct Preference Optimization" (2023) simplifies the alignment process of language models by directly utilizing human preference data, making it a widely adopted method in the industry [34][35]. Important Optimizations - The "PagedAttention" mechanism improves memory management for LLMs, significantly enhancing throughput and reducing memory usage during inference [51][52]. - The "Mistral 7B" model showcases how smaller models can achieve high performance through innovative architecture, influencing the development of efficient AI applications [55][56].
ChatGPT,救了我的命
Hu Xiu· 2025-06-28 05:51
本文来自微信公众号:APPSO,作者:appso,题图来自:AI生成 不开玩笑,ChatGPT 真的能救命。 最近 Reddit 上一位用户发帖称,他们朋友几人骑着 ATV(越野沙滩车)在一片未标记的森林小路迷路了 5 个小时,尝试了 Google Maps,Polaris,ATV 专 用地图应用……结果都没用,因为这些地图只会显示主干道。 后来有人开始求助 ChatGPT:每隔几分钟就把 GPS 坐标发给它,结果 GPT 回复了清晰的指南针指示、道路名称和地形信息,最后顺利引导他们安全回 家。 那这样看来,"导航克星"重庆的"8D 魔幻交通"是不是也有救了? 不过在此之前,我们想确认一个问题:ChatGPT 这次成功是"幸存者偏差"吗?比如评论区有不少网友质疑并提出 AI 幻觉的风险。还是说它真的能有效应 用于户外导航任务? 在野外导航,AI 能比 Google Maps 更好用?劝你留个心眼 X 博主 Rohan Paul 发帖称,现在已经有很多研究证明,使用大语言模型来进行户外导航,效果甚至比传统地图还要好。 比如,今年 5 月份发表在 Translational Vision Science & Te ...
航空发动机用上大模型:解决复杂时序问题,性能超越ChatGPT-4o实现SOTA|上交创智复旦
量子位· 2025-06-28 04:42
ITFormer团队 投稿 量子位 | 公众号 QbitAI 时序数据分析在工业监控、医疗诊断等领域至关重要。 比如航空发动机监控这个复杂工业场景中,工程师需分析海量多通道传感器数据,以判断设备状态并制定维护决策。 然而,现有研究多聚焦于分类、预测等单一任务,与实际工业场景中专家通过自然语言进行复杂交互和决策的需求存在显著差异。 上海交通大学航空航天学院李元祥教授团队 、上海创智学院、复旦大学数据科学学院团队以航空发动机运维为背景,提出 高效、可迁移的时 序-语言桥接架构—— ITFormer ,将专家诊断过程抽象为"理解、感知、推理、决策"四个认知层次,并首次系统性地定义为"时序问答"任务 范式。 团队 基于NASA航空发动机数据,构建了包含11万余问答对的EngineMT-QA数据集。 该数据集的任务设计紧密贴合专家的认知流程,为评 估模型在真实工业场景下的推理能力提供了首个标准化基准。 结果显示,ITFormer以模块化设计实现了时序数据与大语言模型的高效融合,仅需训练不足1%的额外参数,便可在通用时序问答数据集上表 现出优越的性能和良好的迁移能力,展现了卓越的"即插即用"特性。 它可无缝适配Patch ...
DeepSeek-R2为什么还没发?
量子位· 2025-06-27 08:09
Core Viewpoint - The release of DeepSeek-R2 has been delayed due to CEO Liang Wenfeng's dissatisfaction with its performance and a shortage of Nvidia H20 chips, which are critical for its development [1][2][4]. Development Timeline - The anticipation for R2 began after the release of the DeepSeek-V3 model in December last year, which was considered a benchmark for cost-performance [5]. - An upgrade to V3 was announced in March 2023, leading to speculation that R2 would be released in April [11]. - Despite the release of a paper on scaling laws in early April, there has been no official update on R2 since then [12][16]. Technical Specifications - R1's training utilized 30,000 H20 chips, 10,000 H800 chips, and 10,000 H100 chips, indicating the significant computational resources required for R2 [3]. - Leaked parameters for R2 suggested it would have 1.2 trillion parameters and utilize 5.2 petabytes of training data, although the authenticity of these claims remains uncertain [17]. Community Reactions - Following the news of the delay, community responses varied, with some expressing belief that the delay is justified, while others speculated that R2 might wait for the release of V4 [26][30].
今年大火的目标导航到底是什么?从目标搜索到触达有哪些路线?
具身智能之心· 2025-06-26 14:19
Core Viewpoint - Goal-Oriented Navigation empowers robots to autonomously complete navigation tasks based on goal descriptions, marking a significant shift from traditional visual language navigation systems [2][3]. Group 1: Technology Overview - Embodied navigation is a core area of embodied intelligence, relying on three technical pillars: language understanding, environmental perception, and path planning [2]. - Goal-Oriented Navigation requires robots to explore and plan paths in unfamiliar 3D environments using only goal descriptions such as coordinates, images, or natural language [2]. - The technology has been industrialized in various verticals, including delivery, healthcare, and hospitality, enhancing service efficiency [3]. Group 2: Technological Evolution - The evolution of Goal-Oriented Navigation can be categorized into three generations: - First Generation: End-to-end methods focusing on reinforcement learning and imitation learning, achieving breakthroughs in Point Navigation and closed-set image navigation tasks [5]. - Second Generation: Modular methods that explicitly construct semantic maps, breaking tasks into exploration and goal localization [5]. - Third Generation: Integration of large language models (LLMs) and visual language models (VLMs) to enhance knowledge reasoning and open vocabulary target matching [7]. Group 3: Challenges and Learning Path - The complexity of embodied navigation, particularly Goal-Oriented Navigation, necessitates knowledge from multiple fields, making it challenging for newcomers to enter the domain [9]. - A new course has been developed to address these challenges, focusing on quick entry, building a research framework, and combining theory with practice [10][11][12]. Group 4: Course Structure - The course will cover the theoretical foundations and technical lineage of Goal-Oriented Navigation, including task definitions and evaluation benchmarks [15]. - It will also delve into the Habitat simulation ecosystem, end-to-end navigation methodologies, modular navigation architectures, and LLM/VLM-driven navigation systems [16][18][20][22]. - A significant project will focus on the reproduction of VLFM algorithms and their deployment in real-world scenarios [24].
张亚勤:未来电车品牌可能出现整合,2030年将有10%新车具备 L4 级自动驾驶能力
Sou Hu Cai Jing· 2025-06-26 10:04
Group 1 - The 16th Davos Forum (New Champions Annual Meeting) will be held in Tianjin from June 24 to 26, 2025, with Zhang Yaqin, a foreign academician of the Chinese Academy of Engineering and the director of Tsinghua University's Intelligent Industry Research Institute, attending and sharing insights [2] - Zhang Yaqin predicts that the autonomous driving sector is approaching a "DeepSeek moment," highlighting significant advancements in robotaxi technology, which has seen substantial progress and commercialization in cities like San Francisco, Los Angeles, Austin, Tokyo, and Tesla's location [2] - In China, Baidu's Apollo Go system has been operational for the longest time, successfully covering Wuhan with over 1,000 vehicles, indicating a new phase in the industry, alongside efforts from companies like WeRide [2] Group 2 - Zhang Yaqin emphasizes two core goals for safety and economic efficiency: significantly improving safety to be ten times safer than human drivers, aiming to reduce 90% of accidents caused by human error, and transforming vehicle economics by eliminating driver costs, potentially doubling operational efficiency [3] - The development of generative AI and large language models is helping to address two major challenges in autonomous driving: processing and understanding vast amounts of data, and enabling end-to-end training of decision models, which simplifies the system while maintaining safety boundaries [3] - Zhang Yaqin forecasts that by 2030, 10% of new vehicle shipments will possess L4 autonomous driving capabilities, catering to both robotaxi and consumer markets, while also noting the need for improved charging infrastructure and competitive regulations in the electric vehicle ecosystem [4]
如何做到在手机上实时跑3D真人数字人?MNN-TaoAvatar开源了!
机器之心· 2025-06-25 00:46
Core Viewpoint - TaoAvatar is a breakthrough 3D digital human technology developed by Alibaba's Taobao Meta Technology team, enabling real-time rendering and AI dialogue on mobile and XR devices, providing users with a realistic virtual interaction experience [1][8]. Group 1: Technology Overview - TaoAvatar utilizes advanced 3D Gaussian splatting technology to create lifelike full-body avatars that capture intricate facial expressions and gestures, as well as details like clothing folds and hair movement [8]. - The technology significantly reduces the cost and increases the efficiency of digital human modeling, facilitating large-scale applications [9]. - MNN-TaoAvatar is an open-source 3D digital human application that integrates multiple leading AI technologies, allowing natural voice interaction with digital humans on mobile devices [10]. Group 2: Performance Metrics - The application runs efficiently on mobile devices, with key performance metrics for various models as follows: - ASR (Automatic Speech Recognition): Model size 281.65M, RTF: 0.18 - LLM (Large Language Model): Model size 838.74M, pre-fill speed: 165 tokens/s, decode speed: 41.16 tokens/s - TTS (Text-to-Speech): Model size 1.34GB, RTF: 0.58 - A2BS (Audio-to-BlendShape): Model size 368.71MB, RTF: 0.34 - NINIR (Rendering Output): Model size 138.40MB, rendering frame rate: 60 FPS [16][17][18]. Group 3: Development and Optimization - MNN-TaoAvatar is built on the MNN engine, which supports various algorithm modules, enhancing the performance of AI applications in real-time scenarios [23][30]. - The MNN-LLM module demonstrates superior CPU performance, with pre-fill speed improved by 8.6 times compared to llama.cpp and decoding speed improved by 2.3 times [34]. - The MNN-NNR rendering engine employs optimizations such as data synchronization and scheduling to ensure efficient rendering, achieving smooth output at 60 FPS even with lower frequency updates [40][45]. Group 4: Hardware Requirements - Recommended hardware for MNN-TaoAvatar includes devices with Qualcomm Snapdragon 8 Gen 3 or equivalent CPU, at least 8GB of RAM, and 5GB of storage for model files [51].
具身领域的目标导航到底是什么?从目标搜索到触达有哪些路线?
具身智能之心· 2025-06-24 14:09
目标驱动导航,赋予机器人自主完成导航目标 具身导航作为具身智能的核心领域,涉及语言理解、环境感知、路径规划三大技术支柱。目标驱动导航(Goal-Oriented Navigation)通过赋予机器人自主决策能 力,是具身导航中最具代表性的方向。 目标驱动导航要求智能体在陌生的三维环境中,仅凭目标描述(如坐标、图片、自然语言)等,即可自主完成环境探索与 路径规划。 与传统视觉语言导航(VLN)依赖显式指令不同,目标驱动导航系统需要实现从"听懂指令走对路"到"看懂世界自己找路"的跃迁:当人类下达"去厨房拿可乐"的指 令时,机器人需自主完成语义解析(识别厨房空间特征与可乐视觉属性)、环境建模(构建家居场景的空间拓扑)以及动态决策(避开移动的人类或宠物),这 背后凝聚着计算机视觉、强化学习与3D语义理解的交叉突破。 目标驱动导航技术已在多个垂直领域实现产业化落地。在终端配送场景中,该技术与社交导航算法结合,使机器人具备应对动态环境和人际交互的能力:美团无 人配送车通过动态路径重规划在复杂城市环境中执行递送任务,Starship Technologies的园区配送机器人已在欧美高校和社区部署。在医疗、酒店及餐饮场景,嘉 ...
一文读懂美国AI之战--“科技五巨头”与“AI三小龙”的战争
硬AI· 2025-06-24 12:28
Core Viewpoint - The article highlights the intense competition in the AI arms race among traditional tech giants and emerging AI companies, with Meta's aggressive talent acquisition reflecting the urgency of the situation [1][2]. Group 1: Apple - Apple has faced significant setbacks in its AI initiatives, particularly with the Apple Intelligence project, and while it maintains hardware advantages, it needs deeper AI collaborations [4][5]. - The company’s core business remains unaffected by AI threats, as AI applications still rely on Apple devices for access [4]. - Apple should focus on building the best hardware for the AI era and invest in robotics and home automation to maintain its competitive edge [5]. Group 2: Google - Google has a leading position in AI infrastructure, with its Gemini model excelling in media creation, but its core search business faces disruptive threats from conversational AI [6][7]. - The company benefits from vast data resources and distribution channels, particularly through its Android system, which could challenge Apple's dominance in the high-end market [7]. - Google is working to transform AI from a disruptive technology into an enhancement tool for its search capabilities [7]. Group 3: Meta - Meta's strategic positioning is solid, focusing on personalized content and generative advertising, but it faces execution challenges and risks from attention resource competition [8]. - The urgency of Meta's talent recruitment indicates a recognition of significant threats to its core business from AI developments [8]. Group 4: Microsoft - Microsoft remains in a strong position but faces new challenges due to increasing tensions with OpenAI regarding profit-sharing and future collaborations [9][10]. - The company should prioritize maintaining its exclusive access to OpenAI's API through Azure while exploring partnerships with other model providers [10]. Group 5: Amazon - Amazon's outlook has improved, as AI is expected to benefit its business rather than disrupt it, particularly through AWS and product recommendations on Amazon.com [11][12]. - The partnership with Anthropic appears more stable compared to Microsoft's relationship with OpenAI, providing Amazon with a strategic advantage [12]. Group 6: Emerging AI Companies - OpenAI has established dominance in consumer AI, but faces conflicts with companies like Microsoft and Apple over customer relationships [13][14]. - Anthropic has built a strong position among developers, focusing on API revenue streams and maintaining a stable partnership with AWS [14]. - xAI is struggling with its infrastructure strategy and should seek investments to enhance its market position [15].