Workflow
GRANT
icon
Search documents
小米7篇论文入选顶会AAAI,前沿领域全覆盖!
自动驾驶之心· 2025-12-22 03:23
Core Viewpoint - Xiaomi has made significant strides in AI research, with seven papers accepted at AAAI 2026, showcasing its comprehensive capabilities across various AI domains, including sound editing, speech Q&A, embodied intelligence, and autonomous driving [5][6][41]. Group 1: Research Achievements - Xiaomi's seven accepted papers cover a wide range of AI research areas, demonstrating its commitment to foundational technology and long-term investment in AI [6][41]. - The research topics include sound effect editing, speech question answering, 3D embodied agents, visual language navigation, retrieval models, inference decoding strategies, and autonomous driving [6][41]. Group 2: AutoLink Framework - AutoLink addresses the challenges of large-scale text-to-SQL by allowing models to explore database schemas iteratively rather than loading all data at once, achieving a strict recall of 97.4% on Bird-Dev and 91.2% on Spider-2.0-Lite [9][10]. - This framework enables LLMs to act like intelligent agents, dynamically identifying relevant schema parts for SQL generation, thus enhancing efficiency and scalability [10]. Group 3: SpecFormer Model - SpecFormer redefines the role of draft models in speculative decoding by integrating unidirectional and bidirectional attention, allowing for faster decoding without the need for complex draft trees [12][13][15]. - This model can understand context while generating predictions in parallel, leading to lower training costs and better hardware compatibility for large-scale deployments [15]. Group 4: CLSR for Long-form Speech - CLSR (Contrastive Language-Speech Retriever) improves long-form speech question answering by extracting relevant segments from lengthy audio recordings, enhancing accuracy and efficiency [17][20]. - This approach reduces irrelevant information and allows large models to focus on key content, significantly improving performance in speech Q&A tasks [20]. Group 5: AV-Edit for Sound Editing - AV-Edit revolutionizes sound effect editing by integrating visual, audio, and textual semantics, allowing for precise and contextually relevant sound modifications [21][24]. - The model utilizes a three-modal generative framework to achieve high-quality sound editing that aligns with video content, outperforming traditional methods [24]. Group 6: ORS3D for Task Scheduling - ORS3D introduces a new task definition for embodied agents, focusing on parallel task execution and efficient scheduling in 3D environments [26][29]. - The GRANT model incorporates scheduling tokens to optimize task execution, demonstrating competitive performance in language understanding and spatial reasoning [28][29]. Group 7: SpNav for Spatial Navigation - SpNav addresses the gap in embodied intelligence navigation by combining high-level human instructions with spatial understanding, enabling robots to navigate complex environments effectively [33][35]. - The framework utilizes a dataset of 10,000 trajectories to train agents in understanding spatial descriptions and executing precise navigation plans [35]. Group 8: VILTA for Autonomous Driving - VILTA (VLA-in-the-Loop Trajectory Adversary) enhances autonomous driving strategies by generating adversarial trajectories for rare and complex scenarios, improving system robustness [37][40]. - This method integrates visual language models to refine trajectory generation, ensuring that the resulting paths are both diverse and physically feasible [40].
X @BitMart
BitMart· 2025-12-19 18:50
🎉 To celebrate the listing of GrantiX (GRANT), we are giving away 50,000 GRANT in our Trading Competition! 🎁💥 Trading Competition Event : 50,000 GRANT Giveaway!💰Trade now: https://t.co/LrTekkePhD➡️ Register Now: https://t.co/yMWtjiXqwW🔗 Details: https://t.co/YO8AN8YCgm ...
X @BitMart
BitMart· 2025-12-19 16:25
🎉 To celebrate the listing of GrantiX (GRNAT), we are giving away 50,000 GRANT in our Trading Competition! 🎁💥 Trading Competition Event : 50,000 GRANT Giveaway!💰Trade now: https://t.co/LrTekkePhD➡️ Register Now: https://t.co/bt07NOk3mF🔗 Details: https://t.co/YO8AN8YCgm ...
X @BitMart
BitMart· 2025-12-18 12:05
Listing Announcement - GrantiX (GRANT) is listed on BitMart [1] - Trading feature will be available on December 18, 2025 at 12:00 PM UTC [1] Trading Information - Deposit feature is currently available [1] - Trade GrantiX (GRANT) at the provided link [1]
X @BitMart
BitMart· 2025-12-17 14:35
#BitMart is thrilled to announce the exclusive primary listing of GrantiX (GRANT)🎉💰Trading pair: GRANT/USDT💎Deposit: Available💎Trading: 12/18/2025 12:00 PM UTC➡️ Register Now: https://t.co/FjLIyBD2DdLearn more: https://t.co/I2GZDg4Y8c https://t.co/dYmOxRhTj5 ...
AAAI 2026 Oral | 华科&小米提出具身智能新范式:教机器人「时间管理」
具身智能之心· 2025-11-27 00:04
Core Viewpoint - The article discusses the integration of operations research (OR) into embodied AI for improved task execution efficiency, highlighting the development of a new dataset (ORS3D-60K) and a model (GRANT) that enhances robots' ability to perform parallel tasks, achieving a 30.53% increase in efficiency [2][22]. Group 1: Pain Points - Current embodied AI systems often execute tasks sequentially, lacking the ability to recognize which tasks can be performed in parallel, leading to inefficiencies [3][5]. - The inability to utilize operations research knowledge results in robots not being able to optimize task scheduling in complex 3D environments [5][6]. Group 2: Contributions - The ORS3D-60K dataset is introduced, comprising 4,376 real indoor scenes and 60,825 complex tasks, with an average instruction length of 311 words, significantly more complex than previous datasets [12][13]. - Each task in the dataset has been validated by an operations research solver, distinguishing between parallelizable and non-parallelizable tasks, thus enabling optimal scheduling [13][22]. Group 3: Methodology - The GRANT model is proposed, which includes a scheduling token mechanism that allows the model to predict task attributes and utilize an external optimization solver for efficient scheduling [16][19]. - GRANT combines a 3D scene encoder, a large language model (LLM), a scheduling token mechanism, and a 3D grounding head to achieve optimal task execution [19]. Group 4: Experimental Results - Experiments on the ORS3D-60K dataset show that GRANT achieves state-of-the-art performance, with a 30.53% increase in task completion efficiency and a 1.38% improvement in 3D grounding accuracy [18][21]. - The model effectively utilizes waiting periods in tasks to perform other actions, reducing total task time from 74 minutes to 45 minutes, demonstrating a 39% efficiency improvement [21]. Group 5: Summary and Outlook - The research indicates a shift in embodied AI from basic semantic understanding to advanced operational decision-making, with the potential for real-world applications in robotics [22]. - The framework aims to bridge the gap between multimodal large models and optimization solvers, paving the way for robots that can manage time effectively in daily tasks [22].
AAAI'26 Oral | 华科&小米提出新范式:教机器人「时间管理」,任务效率提升30%以上!
具身智能之心· 2025-11-26 10:00
Core Viewpoint - The article discusses the introduction of operations research (OR) knowledge into embodied AI for task planning, highlighting the development of a new dataset (ORS3D-60K) and a model (GRANT) that significantly improves task execution efficiency by 30.53% [2][22]. Group 1: Pain Points - Current embodied AI systems struggle with task planning as they often assume tasks must be completed sequentially, lacking the ability to recognize parallelizable tasks [3][6]. - The inability to utilize OR knowledge limits robots from efficiently managing time and resources in complex 3D environments [6][8]. Group 2: Contributions - The ORS3D-60K dataset is introduced, comprising 4,376 real indoor scenes and 60,825 complex tasks, with an average instruction length of 311 words, significantly more complex than previous datasets [10][12]. - Each task in the dataset has been validated by an OR solver, distinguishing between tasks that require continuous attention and those that can run in the background, thus enabling optimal scheduling [12][22]. Group 3: Methodology - The GRANT model is proposed, which integrates a scheduling token mechanism (STM) to enhance the capabilities of existing multimodal models by allowing them to predict task attributes and utilize an external optimization solver for efficient scheduling [16][19]. - GRANT's architecture includes a 3D scene encoder, a large language model (LLM), the STM, and a 3D localization head, effectively combining language understanding with time management [19][22]. Group 4: Experimental Results - Experiments on the ORS3D-60K dataset show that GRANT achieves state-of-the-art performance, with a 30.53% increase in task completion efficiency and a 1.38% improvement in 3D grounding accuracy [18][21]. - The model effectively utilizes waiting periods in tasks to parallelize operations, reducing total task time from 74 minutes to 45 minutes, demonstrating a 39% efficiency improvement [21]. Group 5: Summary and Outlook - This research marks a shift in embodied AI from basic semantic understanding to advanced operational decision-making, aiming to create intelligent agents capable of efficient time management in real-world applications [22].