Workflow
多模态大模型
icon
Search documents
前京东智能驾驶一号位创业,「星源智」要打造通用具身大脑丨36氪独家
36氪· 2025-09-11 23:46
Core Viewpoint - The article discusses the emergence of a new industrial revolution driven by AI, particularly focusing on the development of embodied intelligence and its potential to solve last-mile delivery challenges in logistics [5][10]. Group 1: Company Overview - Liu Dong, the founder of Xingyuan Intelligence, previously worked at JD Logistics, where he identified the last-mile delivery problem that existing automated solutions could not address [5][18]. - Xingyuan Intelligence recently completed a 200 million yuan angel round of financing, with investments from various venture capital and industry players [9][14]. - The company aims to develop a "general embodied brain" that can enhance the capabilities of robots in logistics and delivery [12][20]. Group 2: Technical Approach - The company has chosen a layered architecture for its embodied intelligence system, separating the "brain" responsible for perception and planning from the "small brain" that executes actions [12][22]. - Liu Dong believes that the current industry lacks a low-cost method to obtain real machine data, making pure end-to-end models impractical at this stage [11][23]. - The layered approach allows robots to start working and accumulate data, which can later be used to train more advanced models [23][24]. Group 3: Market Strategy - Xingyuan Intelligence operates a dual-track business model, acting as both a Tier 1 supplier providing integrated solutions to robot manufacturers and a contractor offering complete robotic solutions to end customers [14][30]. - The company focuses on specific scenarios such as picking robots for supermarkets and pharmacies, which are seen as the fastest to implement and generate revenue [36][42]. - The pricing strategy for their robotic solutions aims to keep costs low, making it attractive for clients to replace human labor with robots [38][39]. Group 4: Commercial Viability - The company has identified clear revenue growth paths and market opportunities, with expectations for picking robots to be operational by next year [45][46]. - Liu Dong emphasizes that the ability to implement solutions in real-world scenarios is crucial for the survival and success of the company [13][46]. - The company plans to leverage its technology to address various applications, including navigation and inspection, which can quickly lead to revenue generation [43][44].
转行自动驾驶算法之路 - 学习篇
自动驾驶之心· 2025-09-10 23:33
Group 1 - The article introduces a significant learning package for the new academic season, including a 299 yuan discount card that offers a 30% discount on all platform courses for one year [3][5]. - Various course benefits are highlighted, such as a 1000 yuan purchase giving access to two selected courses, and discounts on specific classes and hardware [3][6]. - The focus is on cutting-edge autonomous driving technologies for 2025, particularly end-to-end (E2E) and VLA (Vision-Language Alignment) autonomous driving systems [5][6]. Group 2 - End-to-end autonomous driving is emphasized as a core algorithm for mass production, with a notable mention of the competition sparked by the UniAD paper winning the CVPR Best Paper award [6][7]. - The article discusses the rapid evolution of technology in the field, indicating that previous learning materials may no longer be suitable for current industry standards [7]. - The challenges faced by beginners in understanding fragmented knowledge and the lack of high-quality documentation in end-to-end autonomous driving research are addressed [7][8]. Group 3 - The article outlines specific courses aimed at addressing the complexities of autonomous driving, including a small class on 4D annotation algorithms, which are crucial for training data generation [11][12]. - The importance of automated 4D annotation in enhancing the efficiency of data loops and improving the generalization and safety of autonomous driving systems is highlighted [11]. - The introduction of a multi-modal large model and practical courses in autonomous driving is noted, reflecting the growing demand for skilled professionals in this area [15][16]. Group 4 - The article features expert instructors for the courses, including Jason, a leading algorithm expert in the industry, and Mark, a specialist in 4D annotation algorithms [8][12]. - The curriculum is designed to provide a comprehensive learning experience, addressing real-world challenges and preparing students for job opportunities in the autonomous driving sector [23][29]. - The article emphasizes the importance of community engagement and support through dedicated VIP groups for course participants, facilitating discussions and problem-solving [29].
击败英伟达,全球四项第一!优必选自研人形机器人最强大脑Thinker登顶全球!
机器人圈· 2025-09-10 09:07
近日,优必选自主研发的人形机器人 Walker 最强大脑——百亿参数基座的多模态大模型:优必选 Thinker ,在 机器人感知与规划领域三大国际权威基准测试——分别由微软、谷歌等发起与提出的 MS COCO Detection Challenge 、 RoboVQA 与 Egoplan-bench2 中,针对二十一个场景、四大类型的任务规划等命题,优必选一举 斩获四项全球榜单第一。榜单吸引了来自英伟达、北京智源研究院、上海 AI Lab 等全球顶尖团队,角逐激烈。优 必选这次取得的成绩不仅体现了其机器人在复杂环境感知、语义理解与长程任务规划方面的全方位技术领先性, 也标志着人形机器人 Walker S 系列的"最强大脑"实现关键进化。 多模态感知+强推理规划 赋能工业场景规模化应用 在智能化浪潮席卷全球的当下,人形机器人的规划能力已成为关键竞争维度之一。传统机器人系统依赖预设指令 执行任务,难以应对高度动态、多变的现实场景。而本次三大基准测试的核心,正是针对人形机器人在复杂环境 中的多模态感知和推理规划能力进行系统化验证。 MS COCO detection challenge 由微软发起,是计算机视觉领域 ...
全球首个L4级能源AI Agent,预测准确率较传统方法提升30%以上 | 创新场景
Tai Mei Ti A P P· 2025-09-08 01:13
Core Insights - LEMMA, launched by ELU Technology Group, is the world's first L4-level energy AI Agent, representing a significant breakthrough in AI application within the energy sector [1] - The solution is based on the concept of "bit empowering watt," utilizing the self-developed ILM (Infinity Large Model) for AI decision-making and the HEE (Hyper Energy Engine) as its technological foundation [1] - LEMMA transitions energy systems from traditional passive responses to proactive intelligent services, enabling autonomous market monitoring, opportunity discovery, strategy formulation, and decision execution [1] Technical Architecture - The core engine of the L4-level AI Agent is designed to support complex scene understanding and reasoning capabilities [2] - It features a complete closed-loop system for proactive perception, autonomous decision-making, and intelligent execution [2] - The system supports multi-modal data fusion processing, including text, numerical, image, and time-series data [2] Application Scenarios - LEMMA is applicable in energy trading, virtual power plant scheduling, energy storage system optimization, and load forecasting [1][2] - It autonomously monitors various trading products in the electricity spot market and auxiliary service market [2] - The system can automatically formulate and execute optimal trading strategies while optimizing distributed energy resource allocation [2] Performance Outcomes - The accuracy of short-term load forecasting has reached 98.5%, improving by over 30% compared to traditional methods [4] - Price prediction accuracy has improved by 35%, providing a more reliable basis for trading decisions [4] - The system's decision response time has been reduced from minutes to milliseconds, supporting high-frequency trading scenarios [4] Economic and Social Impact - The trading revenue in pilot projects has increased by 25-40% compared to traditional methods, while operational costs have decreased by over 30% [4] - The technology has processed transaction amounts exceeding 100 billion, covering various types of clients including power generation companies and industrial users [4] - LEMMA contributes to achieving carbon neutrality goals and promotes the digital transformation of the energy industry [3][6] Industry Influence - As the first L4-level energy AI Agent, LEMMA sets a technological benchmark in the industry and fosters the development of a new ecosystem for energy AI applications [6] - The solution aids traditional energy companies in their transformation and upgrade paths, leading the energy sector towards intelligent and digital development [6]
自动驾驶中有“纯血VLA"吗?盘点自动驾驶VLM到底能起到哪些作用~
自动驾驶之心· 2025-09-06 16:05
Core Viewpoint - The article discusses the challenges and methodologies involved in developing datasets for autonomous driving, particularly focusing on the VLA (Visual Language Action) model and its applications in trajectory prediction and scene understanding [1]. Dataset Handling - Different datasets have varying numbers of cameras, and the VLM model can handle this by automatically processing different image token inputs without needing explicit camera counts [2] - The output trajectories are based on the vehicle's current coordinate system, with predictions given as relative (x, y) values rather than image coordinates, requiring additional camera parameters for mapping to images [6] - The VLA model's output format is generally adhered to, but occasional discrepancies occur, which are corrected through Python programming for format normalization [8][9] Trajectory Prediction - VLA trajectory prediction differs from traditional methods by incorporating scene understanding capabilities through QA training, enhancing the model's ability to predict trajectories of dynamic objects like vehicles and pedestrians [11] - The dataset construction faced challenges such as data quality issues and inconsistencies in coordinate formats, which were addressed through rigorous data cleaning and standardization processes [14][15] Data Alignment and Structure - Data alignment is achieved by converting various dataset formats into a unified relative displacement in the vehicle's coordinate system, organized in a QA format that includes trajectory prediction and dynamic object forecasting [18] - The input data format consists of images and trajectory points from the previous 1.5 seconds to predict future trajectory points over 5 seconds, adhering to the SANA standard [20] Community and Resources - The "Autonomous Driving Heart Knowledge Planet" community focuses on cutting-edge technologies in autonomous driving, covering nearly 40 technical directions and fostering collaboration between industry and academia [22][24] - The community offers a comprehensive platform for learning, including video tutorials, Q&A sessions, and job opportunities in the autonomous driving sector [28][29]
自动驾驶之心开学季火热进行中,所有课程七折优惠!
自动驾驶之心· 2025-09-06 16:05
Group 1 - The article introduces a significant learning package for the new academic season, including a 299 yuan discount card that offers a 30% discount on all platform courses for one year [3][5]. - Various course benefits are highlighted, such as a 1000 yuan purchase giving access to two selected courses, and discounts on specific classes and hardware [3][6]. - The focus is on cutting-edge autonomous driving technologies for 2025, particularly end-to-end (E2E) and VLA (Vision-Language Alignment) autonomous driving systems [5][6]. Group 2 - End-to-end autonomous driving is emphasized as a core algorithm for mass production, with a notable mention of the competition sparked by the UniAD paper winning the CVPR Best Paper award [6][7]. - The article discusses the challenges faced by beginners in mastering multi-modal large models and the fragmented nature of knowledge in the field, which can lead to discouragement [7][8]. - A course on automated 4D annotation algorithms is introduced, addressing the increasing complexity of training data requirements for autonomous driving systems [11][12]. Group 3 - The article outlines a course on multi-modal large models and practical applications in autonomous driving, reflecting the rapid growth and demand for expertise in this area [15][16]. - It mentions the increasing job opportunities in the field, with companies actively seeking talent and offering competitive salaries [15][16]. - The course aims to provide a systematic learning platform, covering topics from general multi-modal large models to fine-tuning for end-to-end autonomous driving applications [16][18]. Group 4 - The article emphasizes the importance of community and communication in the learning process, with dedicated VIP groups for course participants to discuss challenges and share insights [29]. - It highlights the need for practical guidance in transitioning from theory to practice, particularly in the context of real-world applications and job readiness [29][31]. - The article also mentions the availability of specialized small group courses to address specific industry needs and enhance practical skills [23][24].
筹备了很久,下周和大家线上聊一聊~
自动驾驶之心· 2025-09-05 07:50
Core Viewpoint - The article emphasizes the establishment of an online community focused on autonomous driving technology, aiming to facilitate knowledge sharing and networking among industry professionals and enthusiasts [5][12]. Group 1: Community and Activities - The community has over 4,000 members and aims to grow to nearly 10,000 in the next two years, providing a platform for technical exchange and sharing [5][11]. - An online event is planned to engage community members, allowing them to ask questions and interact with industry experts [1][3]. - The community includes members from leading autonomous driving companies and top academic institutions, fostering a collaborative environment [12][20]. Group 2: Technical Focus Areas - The community covers nearly 40 technical directions in autonomous driving, including multi-modal large models, closed-loop simulation, and sensor fusion, suitable for both beginners and advanced learners [3][5]. - A comprehensive learning path is provided for various topics, such as end-to-end autonomous driving, multi-sensor fusion, and world models, to assist members in their studies [12][26]. - The community has compiled resources on open-source projects, datasets, and industry trends, making it easier for members to access relevant information [24][25]. Group 3: Job Opportunities and Networking - The community has established a job referral mechanism with several autonomous driving companies, facilitating connections between job seekers and potential employers [8][54]. - Members can freely ask questions regarding career choices and research directions, receiving guidance from experienced professionals [54][57]. - Regular discussions with industry leaders are held to share insights on the development trends and challenges in autonomous driving [57][59].
开学了,需要一个报团取暖的自驾学习社区...
自动驾驶之心· 2025-09-04 23:33
Group 1 - The article discusses the importance of the autumn recruitment season, highlighting a student's experience of receiving an offer from a tier 1 company but feeling unfulfilled due to a desire to transition to a more advanced algorithm position [1] - The article encourages perseverance and self-challenge, emphasizing that pushing oneself can reveal personal limits and potential [2] Group 2 - A significant learning package is introduced, including a 299 yuan discount card for a year of courses at a 30% discount, various course benefits, and hardware discounts [4][6] - The focus is on cutting-edge autonomous driving technologies for 2025, particularly end-to-end (E2E) and VLA autonomous driving systems, which are becoming central to the industry [7][8] Group 3 - The article outlines the development of end-to-end autonomous driving algorithms, emphasizing the need for knowledge in multimodal large models, BEV perception, reinforcement learning, and more [8] - It highlights the challenges faced by beginners in synthesizing knowledge from fragmented research papers and the lack of practical guidance in transitioning from theory to practice [8] Group 4 - The introduction of a new course on automated 4D annotation algorithms is aimed at addressing the increasing complexity of training data requirements for autonomous driving systems [11][12] - The course is designed to help students navigate the challenges of data annotation and improve the efficiency of data loops in autonomous driving [12] Group 5 - The article discusses the emergence of multimodal large models in autonomous driving, noting the rapid growth of job opportunities in this area and the need for a structured learning platform [14] - It emphasizes the importance of practical experience and project involvement for job seekers in the autonomous driving sector [21] Group 6 - The article mentions various specialized courses available, including those focused on perception, model deployment, planning control, and simulation in autonomous driving [16][18][20] - It highlights the importance of community engagement and support through dedicated VIP groups for course participants [26]
国投智能(300188.SZ):已将多模态能力应用到了视觉理解和增强上
Ge Long Hui· 2025-09-04 07:26
Core Viewpoint - The company has made significant progress in the field of multimodal large models, applying them across various business lines for enhanced operational capabilities [1] Group 1: Application of Multimodal Large Models - The company utilizes dynamic rules and instructions to implement multimodal large models in behavior recognition, scene analysis, risk warning, and emergency command [1] - Each video is equipped with an intelligent brain through the application of these models, enhancing the understanding of video content [1] - The company has achieved comprehensive perception in video streaming by extracting target event information, creating a complete information cognitive landscape [1] Group 2: Integration with Smart Wearable Devices - The multimodal capabilities have been applied to visual understanding and enhancement in smart wearable devices [1] - The integration of data and service resources has led to a synergy between business scenarios and data capabilities [1]
开放几个大模型技术交流群(RAG/Agent/通用大模型等)
自动驾驶之心· 2025-09-04 03:35
Group 1 - The establishment of a Tech communication group focused on large models, inviting participants to discuss topics such as RAG, AI Agents, multimodal large models, and deployment of large models [1] - Interested individuals can join the group by adding a designated WeChat assistant and providing their nickname along with a request to join the large model discussion group [2]