SAM

Search documents
FleetCor(FLT) - 2025 H2 - Earnings Call Transcript
2025-08-27 01:02
Financial Data and Key Metrics Changes - Overall Total Payment Volume (TPV) grew by 3%, but growth was inconsistent across brands and regions [3][4] - Underlying Profit Before Tax (PBT) fell to just under $290 million, with significant impacts in Q1 and Q4 due to macro conditions [3][4] - The company aims to hold underlying costs flat compared to FY 2025, despite a 3% increase in costs over the last twelve months [5][6] Business Line Data and Key Metrics Changes - Corporate division saw top line growth to $12.3 billion, with a 6% PBT growth excluding Asia [7][8] - Leisure division experienced TPV growth year on year, primarily from lower margin brands, with profit falling due to soft trading conditions [8][9] - Other segments remained flat year on year, with increased profit contributions from operating businesses [9] Market Data and Key Metrics Changes - ANZ and The Americas reported solid profit growth, while EMEA and Asia experienced reductions [3][4] - The UK corporate travel brand underperformed, and Asia faced operational challenges leading to additional provisions [4][5] - The company expects EMEA and Asia to return to more appropriate levels by 2026 [4] Company Strategy and Development Direction - The company is focusing on productivity gains, cost reduction, and targeted investments in technology and AI [5][6][20] - A new Global Business Services division aims to support frontline teams and improve operational efficiency [5][6] - The company is exploring M&A opportunities to expedite growth in specialist businesses [15][16] Management's Comments on Operating Environment and Future Outlook - Management acknowledges a challenging operating environment due to geopolitical tensions and macroeconomic conditions but remains optimistic about medium to long-term growth [2][3] - There are promising signs emerging in key markets, and the company is prepared for a market rebound [23][24] - Management expects a challenging first half of FY 2026 but anticipates a stronger second half [43][44] Other Important Information - The company has undertaken $450 million in capital management initiatives, including debt repayment and share buybacks [9] - Investment in TP Connect increased by $7 million to enhance airline content and new revenue streams [8] - The company is launching a travel retail loyalty program to enhance customer engagement and drive growth [35][36] Q&A Session Summary Question: Can you provide details on the impact of lower overrides in FY 2025 and potential upside for 2026? - Management indicated that lower overrides significantly impacted the leisure business, particularly in the last quarter, and emphasized the importance of growth to achieve higher override tiers [48][52] Question: What are the potential impacts of changes to payment surcharges in Australia? - Management has evaluated the potential impacts and is prepared with various options to mitigate any negative effects [54][57] Question: Can you clarify the outlook for the first half of FY 2026? - Management expects a like-for-like comparison to be relatively flat year on year, with improvements anticipated in Asia [60][62] Question: What should be expected for the other segment's loss in FY 2026? - Management expects the loss to decrease to around $70 million, with improvements anticipated from operating businesses [68][70] Question: How is Corporate Traveler positioned in the UK and Europe? - Management expressed confidence in the UK market, highlighting recent management changes and improvements to the product offering [90][92]
突破SAM局限!美团提出X-SAM:统一框架横扫20+分割基准
自动驾驶之心· 2025-08-12 23:33
Core Insights - The article discusses the introduction of X-SAM, a new segmentation framework that overcomes the limitations of the Segment Anything Model (SAM) by enabling multi-task processing and integrating multi-modal capabilities [3][4][5]. Group 1: Limitations of SAM - SAM was initially seen as a universal solution for visual segmentation but has significant limitations, including single-task focus, inability to understand text instructions, and inefficiency due to the need for multiple models for different tasks [5][6][7]. Group 2: Innovations of X-SAM - X-SAM integrates SAM's visual segmentation capabilities with multi-modal understanding from large language models (LLMs) through a unified input format, a dual-encoder architecture, and multi-stage training [12][13][21]. - The unified input format allows various segmentation tasks to be processed in a consistent manner, enhancing the model's ability to understand both text and visual prompts [13][15]. - The dual-encoder architecture consists of a global image encoder and a segmentation encoder, optimizing both overall scene understanding and pixel-level detail [14][19]. - Multi-stage training involves fine-tuning the segmentation model, aligning visual and language features, and mixed fine-tuning across diverse datasets to enhance generalization [21][23]. Group 3: Performance Metrics - X-SAM has demonstrated superior performance across over 20 datasets and 7 core tasks, achieving state-of-the-art results in various segmentation benchmarks [27][28]. - In the COCO dataset, X-SAM achieved a panorama quality (PQ) score of 54.7, closely following the best-performing model, Mask2Former [31]. - For open vocabulary segmentation, X-SAM's average precision (AP) reached 16.2, significantly outperforming other models [31]. - In referring segmentation tasks, X-SAM achieved corrected Intersection over Union (cIoU) scores of 85.1, 78.0, and 83.8 across different datasets, surpassing competitors [32]. Group 4: New Task Introduction - X-SAM introduces a new task called Visual Grounding Detection (VGD) segmentation, which allows the model to segment all instances of a class based on visual prompts, even across different images [25][26][35]. - In experiments, X-SAM achieved average precision scores of 47.9 to 49.7 for VGD segmentation, significantly exceeding existing models [35]. Group 5: Future Directions - The research team plans to extend X-SAM's capabilities to video segmentation and dynamic scenes, aiming to enhance its application in temporal visual understanding [43].
突破SAM局限!中山大学X-SAM:统一框架横扫20+分割基准
自动驾驶之心· 2025-08-12 10:37
Core Insights - The article discusses the introduction of X-SAM, a new segmentation framework that overcomes the limitations of the Segment Anything Model (SAM) by enabling multi-task processing and integrating multi-modal understanding capabilities [3][4][5]. Group 1: Limitations of SAM - SAM was initially seen as a universal solution for visual segmentation but has significant limitations, including its inability to handle multiple tasks simultaneously and its lack of understanding of textual instructions [2][5][6]. - SAM is designed for single-object segmentation based on visual prompts and cannot perform complex tasks like semantic, instance, or panoptic segmentation [6]. - The gap between visual segmentation and multi-modal understanding is highlighted, where existing models can either understand images or perform pixel-level segmentation but not both effectively [5][6]. Group 2: Innovations of X-SAM - X-SAM is designed to fill the gap left by SAM, providing a unified segmentation framework that can handle various tasks and input types [7][8]. - The architecture of X-SAM includes a dual-encoder system that processes both visual and textual inputs, allowing for a comprehensive understanding of images and instructions [12][14]. - X-SAM introduces a unified input format that standardizes how different segmentation tasks are processed, enabling the model to understand both textual and visual prompts [13][15]. Group 3: Performance and Testing - X-SAM has been tested across over 20 segmentation datasets and 7 core tasks, outperforming existing models in all categories [4][27]. - The model's performance metrics include achieving an average precision (AP) of 47.9 to 49.7 in visual grounding segmentation (VGD), significantly surpassing previous models [26][35]. - In specific tasks, X-SAM achieved a panorama quality (PQ) of 54.7 in COCO panoptic segmentation, demonstrating its robustness in foundational segmentation tasks [31]. Group 4: Training Methodology - X-SAM employs a multi-stage training strategy that includes fine-tuning the segmenter, pre-training for alignment, and mixed fine-tuning across various datasets [21][23]. - The training process incorporates a data balancing resampling strategy to ensure smaller datasets are not overshadowed by larger ones, optimizing overall model performance [24]. - The model's architecture allows for simultaneous training on multiple tasks, enhancing its generalization capabilities [37]. Group 5: Future Directions - The research team plans to extend X-SAM's capabilities to video segmentation and dynamic scenes, aiming to bridge the gap between static image understanding and video comprehension [43].
聊聊DreamVLA:让机器人先看后想再动
具身智能之心· 2025-08-11 00:14
Core Viewpoint - The article introduces DreamVLA, a new Vision-Language-Action model that enhances robotic decision-making by integrating comprehensive world knowledge, allowing robots to predict dynamic environments and make more accurate action decisions [1][27]. Group 1: Background and Need for Advanced VLA Models - Traditional VLA models directly map visual inputs and language commands to actions, which can lead to interference from irrelevant information in complex environments [3][5]. - DreamVLA addresses this by adding a layer of "thinking" that predicts world knowledge, including dynamic areas, depth information, and semantic features before planning actions [5][27]. Group 2: Model Architecture and Functionality - DreamVLA operates on a "perception-prediction-action" cycle, treating the task as an inverse dynamics problem to derive necessary actions from predicted future states [7][27]. - The model processes three types of inputs: visual images, language commands, and the robot's own state, using dedicated encoders for each [10][14]. Group 3: World Knowledge Prediction - DreamVLA predicts world knowledge, which includes dynamic areas, depth maps, and semantic features, rather than directly predicting actions [11][18]. - Dynamic area prediction utilizes CoTracker to identify moving objects and generate masks that highlight relevant areas while filtering out static backgrounds [12][15]. - Depth prediction estimates the spatial relationships of objects, generating depth maps to assist in obstacle avoidance [13][17]. - Semantic prediction employs DINOv2 and SAM models to extract high-level semantic information, which is then encoded into a unified "world embedding" for action generation [18][22]. Group 4: Action Generation - The action generation component uses a diffusion Transformer to produce future action sequences based on the latent action embedding derived from multi-modal inputs [23][27]. - A structured attention mechanism is implemented to ensure coherent multi-step action reasoning and prevent cross-modal knowledge leakage [19][31]. Group 5: Performance and Validation - DreamVLA achieved an average task completion length of 4.44 in the CALVIN ABC-D benchmark, outperforming previous methods by 3.5%, with a real-world task success rate of 76.7% [25][27]. - Ablation studies confirmed the contributions of various components, demonstrating the model's robustness and generalization capabilities [25][31].