大语言模型
Search documents
ICCV 2025 | HERMES:首个统一3D场景理解与生成的世界模型
机器之心· 2025-08-14 04:57
Core Viewpoint - The article discusses the advancements in autonomous driving technology, emphasizing the need for a unified model that integrates both understanding current environments and predicting future scenarios effectively [7][10][30]. Research Background and Motivation - Recent progress in autonomous driving necessitates vehicles to possess deep understanding of current environments and accurate predictions of future scenarios to ensure safe and efficient navigation [7]. - The separation of "understanding" and "generation" in mainstream solutions is highlighted as a limitation in achieving effective decision-making in real-world driving scenarios [8][10]. Method: HERMES Unified Framework - HERMES proposes a unified framework that utilizes a shared large language model (LLM) to drive both understanding and generation tasks simultaneously [13][30]. - The framework addresses challenges such as efficiently inputting high-resolution images and integrating world knowledge with predictive capabilities [11][12]. HERMES Core Design - HERMES employs Bird's-Eye View (BEV) as a unified scene representation, allowing for efficient encoding of multiple images while preserving spatial relationships and semantic details [18]. - The introduction of World Queries facilitates the connection between understanding and future predictions, enhancing the model's ability to generate accurate future scenarios [19][20]. Joint Training and Optimization - HERMES utilizes a joint training process with two optimization objectives: language modeling loss for understanding tasks and point cloud generation loss for accuracy in future predictions [21][22][23]. Experimental Results and Visualization - HERMES demonstrates superior performance in scene understanding and future generation tasks on datasets like nuScenes and OmniDrive-nuScenes [26]. - The model excels in generating coherent future point clouds and accurately describing driving scenes, showcasing its comprehensive capabilities [27]. Summary and Future Outlook - HERMES presents a new paradigm for autonomous driving world models, effectively bridging the gap between 3D scene understanding and future generation [30]. - The model shows significant improvements in prediction accuracy and understanding tasks compared to traditional models, validating the effectiveness of unified modeling [31].
我们都错怪GPT-5了,路由统一算力,免费用户也能创造收益
量子位· 2025-08-14 02:01
它不仅实现了多个模型统一调度,而且还藏着奥特曼的诸多小心思。 比如成本更可控、悄悄识别意图插入广告等。 但是由于GPT-5不开源,这个框架具体啥情况咱们也都无从得知。 不过,最近开源社区出现了一个类似版本——Arch-Router,它会结合任务领域(如金融、法律)和具体动作(如摘要、生成代码)来制定路 由策略,并连接到最适合的模型,与人类的偏好对齐。 henry 发自 凹非寺 量子位 | 公众号 QbitAI GPT-5发布以来,路由架构是最受关心的部分之一。 顺着这个"开源版本",GPT-5路由系统背后,OpenAI的更多设计也浮出水面。 什么是路由框架? 现有的路由方法主要分为两类,一类是任务型路由, 将用户的请求直接导向处理特定任务的预定义模型 ; 另一类则是 基于性能的路由,通过成本-性能评分来调用最具性价比的模型。 然而,用户的请求往往是模糊且主观的,因此,上述的两类路由往往难以精准定位用户偏好,从而无法给出满意的回答。 为了解决上述问题,研究人员提出了我们开头提到的——偏好对齐路由框架Arch-Router,根据用户定义的偏好将路由策略和模型选择统一起 来。 在这个框架中,用户使用领域-动作分类法 ...
WAIC 2025解码:中国的AI巨头真正释放了什么信号?
Counterpoint Research· 2025-08-14 01:03
Core Insights - The WAIC 2025 highlighted the importance of global cooperation in AI governance, with China proposing the establishment of a global AI governance body and releasing a framework with 13 cooperation points [2][3]. Group 1: AI Safety and Governance - Geoffrey Hinton emphasized the potential risks of AI, suggesting that humans could become akin to "poultry" if AI systems operate independently [3]. - Hinton's visit to China signifies the necessity of China's involvement in addressing AI governance and safety issues, aligning with the multilateral AI governance framework signed with representatives from Europe, Southeast Asia, and parts of Africa [3]. - The conference shifted focus from merely accelerating AI development to emphasizing safety principles and multilateral dialogue [3]. Group 2: Alibaba's AI Innovations - Alibaba launched three high-performance open models and a new AI smart glasses product at WAIC 2025, reinforcing its open-source AI strategy [4][5]. - The smart glasses are lightweight, screenless, and integrated with Alibaba's Qwen model, aiming to embed AI into daily interactions [5]. - This move positions Alibaba's open-source models as competitive against both domestic and international counterparts, transforming open-source competition into a platform battle [5][8]. Group 3: Unitree Technology's Robotics - Unitree Technology introduced the R1 humanoid robot, designed for general tasks with dynamic movement and real-time perception capabilities, priced at approximately $5,600 [6][9]. - The R1 targets a broader audience, including developers and research institutions, rather than just enterprise clients, marking a shift towards accessible robotics [6]. - This pricing strategy poses a competitive threat to Tesla's humanoid robot ambitions, as Unitree's offering is significantly cheaper and aims to democratize access to robotics technology [9].
腾讯(00700)Q2电话会:拥有足够芯片用于AI训练和模型升级 在AI推理芯片方面有多种选择
智通财经网· 2025-08-13 22:21
Core Viewpoint - Tencent's Q2 revenue increased by 15% year-on-year to 1845 billion RMB, exceeding expectations, with a net profit growth of 17% [1][12][3] Financial Performance - Total revenue for Q2 was 1850 billion RMB, with a gross profit of 1050 billion RMB, representing a 22% year-on-year increase [3][12] - Non-IFRS operating profit reached 690 billion RMB, up 80% year-on-year, while net profit attributable to shareholders was 630 billion RMB, a 10% increase [3][12] - Core net profit growth was 20% when excluding contributions from associates [12] Business Segments - Value-added services accounted for 50% of total revenue, with social networks contributing 18%, domestic games 22%, and international games 10% [5] - Marketing services revenue grew by 20% year-on-year, driven by AI technology enhancements [10][11] - Financial technology and enterprise services accounted for 30% of total revenue, with a 10% year-on-year growth [11] Gaming Performance - Domestic game revenue grew by 17%, supported by titles like "Delta Force" and evergreen games such as "Honor of Kings" [6][12] - International game revenue increased by 35%, driven by popular titles like "PUBG Mobile" [6][12] AI Integration and Advertising - AI technology has significantly boosted advertising revenue, with a 20% year-on-year growth attributed to improved click-through rates and increased traffic from video and search [10][15] - The company is integrating AI features across various platforms, including WeChat and Tencent Meeting, to enhance user experience and operational efficiency [2][19] Capital Expenditure and Investment - Capital expenditure in Q2 surged over 100% to 191 billion RMB, primarily to support AI capabilities [1][14] - The company is prioritizing capital spending in light of increasing AI investments and is awaiting clarity on chip imports, particularly from the U.S. [1][12] User Engagement and Growth - WeChat's monthly active users grew by 3% to 1.41 billion, with ongoing efforts to integrate more AI functionalities [1][12] - The company is enhancing its social commerce experience through features that encourage user interaction and sharing [7][8] Future Outlook - Management expressed confidence in the long-term growth potential of advertising revenue, driven by AI applications and increased user engagement [15][16] - The company is exploring further investment in AI-driven applications and services to maintain competitive advantage and drive future growth [19][31]
“大年”悄然来临 市场环境成就量化盛宴
Zhong Guo Zheng Quan Bao· 2025-08-13 21:08
Group 1 - The core viewpoint of the articles highlights that 2023 is a significant year for quantitative strategies, with many private equity funds achieving returns exceeding 40% [1][2][6] - Quantitative stock selection strategies have outperformed index-enhanced strategies, with several funds reporting returns over 50% [2][6] - The use of alternative data, continuous signal mining, and the integration of artificial intelligence have contributed to the strong performance of quantitative strategies [3][4] Group 2 - Notable private equity firms, including both established and emerging players, have seen substantial returns from their quantitative stock selection products [2][6] - The "air index increase" strategy has gained popularity due to its flexibility in stock selection, allowing it to adapt to market style changes effectively [3][4] - The average return for 36 billion-level quantitative private equity firms has reached 18.92%, with a significant number achieving returns above 10% [6] Group 3 - The market environment in 2023 has been favorable for quantitative strategies, driven by increased liquidity and a reduction in leverage risks [6] - Small-cap index-enhanced products have also performed well, with several funds reporting returns exceeding 40% [7] - The improvement in market liquidity and the active performance of small-cap stocks have significantly boosted the overall performance of quantitative stock strategies [7]
亿元订单开始涌入,但机器人仅仅靠表演支撑不了这个赛道
Di Yi Cai Jing· 2025-08-13 12:29
签下商单与完成交付、落地干活、获取数据反馈训练之间,仍有差距。 今年以来,以宇树、智元、优必选等为代表的头部厂商先后披露合计超2亿元人民币的人形机器人订单,同时,多家具身厂商在WRC大会中公布了相关订单 数据,规模达到上百台。第一财经记者从大会了解到,购买这些机器人的客户主要来自于运营商、车企、3C及半导体企业等。 不过订单数据与真实落地仍然存在差距。首程控股董事会办公室总经理康雨对第一财经记者表示,有的公司冲订单是为了融资,有的订单签下来也要看具体 履行期是多久,有的虽然单子签下来了,但受限于供应链产能变化,未必能按时交付。从投资人的视角来看,对订单数据要十分审慎。 "有订单是好事,说明商业路径跑通了,但机器人落地需要给客户算清楚经济账,大额订单目前更多提供的是信息价值、品牌价值等附加值。"自变量机器人 CEO王潜对记者表示,行业距离落地的临界点并不太远,预估一年左右时间或可达到,不论是自变量还是其他友商。 商单情况密集披露 截至目前,机器人行业签下的大额订单仍以智元、宇树、优必选为首。智元机器人与宇树科技共同中标中国移动旗下中移(杭州)信息技术有限公司(简 称"中移杭")1.24 亿元人形机器人采购订单。 ...
突破SAM局限!中山大学X-SAM:统一框架横扫20+分割基准
自动驾驶之心· 2025-08-12 10:37
Core Insights - The article discusses the introduction of X-SAM, a new segmentation framework that overcomes the limitations of the Segment Anything Model (SAM) by enabling multi-task processing and integrating multi-modal understanding capabilities [3][4][5]. Group 1: Limitations of SAM - SAM was initially seen as a universal solution for visual segmentation but has significant limitations, including its inability to handle multiple tasks simultaneously and its lack of understanding of textual instructions [2][5][6]. - SAM is designed for single-object segmentation based on visual prompts and cannot perform complex tasks like semantic, instance, or panoptic segmentation [6]. - The gap between visual segmentation and multi-modal understanding is highlighted, where existing models can either understand images or perform pixel-level segmentation but not both effectively [5][6]. Group 2: Innovations of X-SAM - X-SAM is designed to fill the gap left by SAM, providing a unified segmentation framework that can handle various tasks and input types [7][8]. - The architecture of X-SAM includes a dual-encoder system that processes both visual and textual inputs, allowing for a comprehensive understanding of images and instructions [12][14]. - X-SAM introduces a unified input format that standardizes how different segmentation tasks are processed, enabling the model to understand both textual and visual prompts [13][15]. Group 3: Performance and Testing - X-SAM has been tested across over 20 segmentation datasets and 7 core tasks, outperforming existing models in all categories [4][27]. - The model's performance metrics include achieving an average precision (AP) of 47.9 to 49.7 in visual grounding segmentation (VGD), significantly surpassing previous models [26][35]. - In specific tasks, X-SAM achieved a panorama quality (PQ) of 54.7 in COCO panoptic segmentation, demonstrating its robustness in foundational segmentation tasks [31]. Group 4: Training Methodology - X-SAM employs a multi-stage training strategy that includes fine-tuning the segmenter, pre-training for alignment, and mixed fine-tuning across various datasets [21][23]. - The training process incorporates a data balancing resampling strategy to ensure smaller datasets are not overshadowed by larger ones, optimizing overall model performance [24]. - The model's architecture allows for simultaneous training on multiple tasks, enhancing its generalization capabilities [37]. Group 5: Future Directions - The research team plans to extend X-SAM's capabilities to video segmentation and dynamic scenes, aiming to bridge the gap between static image understanding and video comprehension [43].
“利润率要么是0,要么为负”!最火的AI应用竟只是“为大模型打工”?
Hua Er Jie Jian Wen· 2025-08-12 03:31
Core Insights - The AI programming assistant market appears prosperous, but many unicorn companies are facing significant losses due to high costs associated with large language model usage [1][5] - Despite soaring revenues, AI programming companies are experiencing negative profit margins, raising concerns about the sustainability of their business models [2][4] Financial Performance - Anysphere's parent company, Cursor, reached $500 million in annual recurring revenue (ARR) in June, marking the fastest achievement of $100 million ARR in SaaS history [2] - Replit's annual revenue surged from $2 million in August last year to $144 million recently, while Lovable grew from $1 million to $100 million in annual revenue within eight months [2] Profitability Challenges - AI programming companies like Windsurf are struggling with operational costs that exceed their revenue, leading to significantly negative gross margins [4][5] - The gross margins for AI programming companies generally range from 20% to 40%, not accounting for costs incurred from serving free users [4] Cost Structure - The high costs of large language model calls are the primary burden on profits, with these expenses increasing as user numbers grow, contrary to traditional software models [5][6] - The variable costs for startups in this sector are estimated to be between 10% and 15%, making it a high-cost business if not involved in model development [5] Strategic Options - AI programming companies are faced with difficult choices, including developing their own models, being acquired, or passing costs onto users [7][8] - Anysphere announced plans for self-developed models, but progress has been slow, and some companies, like Windsurf, have abandoned this route due to high costs [8] Industry Outlook - The profitability crisis in the AI programming sector raises questions about the sustainability of the entire industry [9] - Direct competition from model providers like OpenAI and Anthropic poses additional challenges, as they are both suppliers and competitors [9] - Investor concerns are growing regarding user loyalty, as users may quickly switch to superior tools developed by competitors [9]
宇树推进IPO,王兴兴谈行业痛点:硬件现阶段够用,具身智能AI拖后腿
Hua Xia Shi Bao· 2025-08-12 00:24
Group 1 - The core objective of the company is to enable robots to perform tasks rather than just entertain or fight, emphasizing the importance of practical applications for robots [1] - The company is currently the most notable player in the humanoid robot sector in China, with significant interest at the 2025 World Robot Conference, although the commercialization of the industry is still in its early stages [1][3] - The company has initiated its listing process with CITIC Securities as the advisory firm, viewing the listing as a step towards more mature management and operations [2] Group 2 - The company reported a revenue exceeding 1 billion yuan last year and has achieved profitability for five consecutive years since 2020, indicating strong financial health [2] - The G1 humanoid robot is noted to have the highest global shipment volume this year, while the Go2 quadruped robot has also seen significant sales, with projected sales of 23,700 units in 2024, capturing approximately 69.75% of the global market [2] - The company has lowered prices to stimulate sales, with the G1 starting at 99,000 yuan and a new smaller humanoid robot R1 priced at 39,900 yuan, aiming to attract more users and build an ecosystem [3] Group 3 - The main challenge hindering the development of humanoid robots is the inadequacy of embodied intelligence AI, rather than hardware limitations [4] - The complexity of developing embodied intelligence models is significantly higher than that of language models, requiring real-time perception and decision-making capabilities [5] - Collaboration between robot manufacturers and large model developers is essential for advancing embodied intelligence models, as many robot companies currently lack the necessary AI model technology and GPU resources [6]
质疑VLA模型、AI完全不够用?有从业者隔空回应宇树王兴兴
Di Yi Cai Jing· 2025-08-11 11:33
Core Viewpoint - The traditional humanoid robots face three core challenges: perception limitations, decision-making gaps, and generalization bottlenecks [5] Group 1: Industry Challenges - The industry is currently unable to utilize full parameter models effectively, indicating a need for deeper collaboration between the robot's brain, cerebellum, and limbs [2] - Traditional robots often rely on preset rules for task execution, making it difficult to adapt to complex and dynamic environments [5] - Robots require manual intervention for reprogramming or strategy adjustments during multi-task switching [5] Group 2: Perspectives on VLA Model - The VLA (Vision-Language-Action) model is seen as a controversial yet pivotal paradigm for humanoid robot motion control, with many in the industry betting on its potential [4] - The OPEN VLA, based on the Llama2 language model with 7 billion parameters, is an example of a smaller-scale model that still faces challenges in effectively utilizing large language models [4] - There is a call for the industry to explore the collaborative distribution of computing power between cloud and edge devices to create a comprehensive deployment architecture [4] Group 3: Future Directions - The ideal "brain" model for humanoid robots should not only be a large language model but a complete system that deeply integrates hardware and software [4] - The industry is encouraged to rethink the VLA model and seek new paradigms, potentially through biomimicry to develop original foundational models for embodied intelligence [6] - There is growing confidence in the humanoid robot industry, with many believing it will become a significant sector, marking this year as a potential turning point for mass production [6]