Workflow
计算机视觉
icon
Search documents
江苏两项目入选文旅部建设名单
Jiang Nan Shi Bao· 2025-07-22 13:48
Core Insights - The Ministry of Culture and Tourism has announced the second batch of technology innovation centers, focusing on interactive simulation amusement equipment and 3D animation production tools [1][2] Group 1: Technology Innovation Centers - The second batch includes two centers: one for interactive simulation amusement equipment and another for 3D animation production tools [1][2] - The first batch consisted of 11 units focusing on performing arts equipment, smart tourism, and cultural digitization [1] Group 2: Interactive Simulation Amusement Equipment Center - The center, based on XuZhou Top Interactive Intelligent Technology Co., aims to address key industry issues such as equipment intelligence and integration [1] - The center focuses on technical innovations in design, system integration, and content distribution for simulation amusement devices [1] - Top has over 30,000 square meters of modern standardized production base and is recognized as a leading enterprise in the amusement equipment technology sector [1] Group 3: 3D Animation Production Tools Center - The center, based on Jiangsu Yuanli Digital Technology Co., focuses on developing innovative 3D digital technology products and services [2] - It implements software systems related to computer vision and deep learning algorithms to enhance animation production efficiency [2] - Yuanli is recognized as a national "specialized and innovative" small giant enterprise, producing acclaimed animations and high-precision 3D scanning data packages [2]
如何创建高质量视觉数据集
3 6 Ke· 2025-07-21 03:43
Group 1 - The importance of high-quality computer vision datasets is emphasized, as the adoption rate of AI in enterprises has increased by 270% over the past four years, driving rapid integration of computer vision applications [1][2] - High-quality data is crucial for training, validating, and testing computer vision models, as the performance of these models heavily relies on the quality and quantity of the data used [1][3] - The article outlines three types of datasets used in computer vision: training data, validation data, and testing data, each serving a specific purpose in model development [3] Group 2 - Five key features of high-quality computer vision datasets are identified: accuracy, diversity, consistency, timeliness, and privacy [5][6] - Low-quality data can lead to challenges such as overfitting and underfitting, which significantly impact model performance [7][9] - The article discusses techniques to maintain high-quality datasets, including reliable data collection, preprocessing, and appropriate dataset splitting [11][13] Group 3 - The future of computer vision datasets is shifting towards a data-centric approach, focusing on improving dataset quality rather than solely optimizing models [14] - The article highlights the role of image datasets in AI and machine learning, particularly in applications like medical imaging, autonomous vehicles, facial recognition, and retail analytics [15][16] - Ethical considerations in dataset creation are crucial to avoid bias and ensure fairness in AI systems [21][22] Group 4 - The article provides best practices for collecting high-quality image data, emphasizing the importance of clarity, resolution, and diversity [22][23] - Various sources for image data collection are discussed, including public datasets, web scraping, and custom data collection, each with its advantages and disadvantages [24][30] - Annotation techniques are critical for ensuring accurate model training, with different types of annotations suited for specific machine learning tasks [25][27] Group 5 - Quality assurance techniques are necessary to maintain high standards in dataset annotation and overall model performance [41] - Regular maintenance and updates of datasets are essential to keep AI models relevant and accurate in changing real-world conditions [46] - The article concludes that a systematic approach to creating effective image datasets is vital for building high-performance AI models [47]
暑假打比赛!PRCV 2025空间智能与具身智能视觉感知挑战赛启动~
自动驾驶之心· 2025-07-17 07:29
Core Viewpoint - The competition aims to advance research in spatial intelligence and embodied intelligence, focusing on visual perception as a key technology for applications in autonomous driving, smart cities, and robotics [2][4]. Group 1: Competition Purpose and Significance - Visual perception is crucial for achieving spatial and embodied intelligence, with significant applications in various fields [2]. - The competition seeks to promote high-efficiency and high-quality research in spatial and embodied intelligence technologies [4]. - It aims to explore innovations in cutting-edge methods such as reinforcement learning, computer vision, and graphics [4]. Group 2: Competition Organization - The competition is organized by a team of experts from institutions like Beijing University of Science and Technology, Tsinghua University, and the Chinese Academy of Sciences [5]. - The competition is supported by sponsors and technical support units, including Beijing Jiuzhang Yunjing Technology Co., Ltd. [5]. Group 3: Competition Data and Resources - Participants will have access to real and simulated datasets, including multi-view drone aerial images and specific simulation environments for tasks [11]. - The sponsor will provide free computing resources, including H800 GPU power for validating and testing submitted algorithms [12][13]. Group 4: Task Settings - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation methods [17]. - The Spatial Intelligence track involves constructing a 3D reconstruction model based on multi-view aerial images [17]. - The Embodied Intelligence track focuses on completing tasks in dynamic occlusion simulation environments [17]. Group 5: Evaluation Methods - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with specific metrics like PSNR and F1-Score [19][20]. - For Embodied Intelligence, evaluation will assess task completion and execution efficiency, with metrics such as success rate and average pose error [23][21]. Group 6: Awards and Recognition - Each track will have awards, including cash prizes and computing vouchers, sponsored by Beijing Jiuzhang Yunjing Technology Co., Ltd. [25]. - Awards include first prize of 6,000 RMB and 500 computing vouchers, with additional prizes for second and third places [25]. Group 7: Intellectual Property and Data Usage - Participants must sign a data usage agreement, ensuring that the provided datasets are used solely for the competition and deleted afterward [29]. - Teams must guarantee that their submitted results are reproducible and that all algorithms and related intellectual property belong to them [29]. Group 8: Conference Information - The 8th China Conference on Pattern Recognition and Computer Vision (PRCV 2025) will be held from October 15 to 18, 2025, in Shanghai [27]. - The conference will feature keynote speeches from leading experts and various forums to promote academic and industry collaboration [28].
自驾搞科研别蛮干!用对套路弯道超车~
自动驾驶之心· 2025-07-11 01:14
Core Viewpoint - The article emphasizes the importance of learning from experienced mentors in the field of research, particularly in LLM/MLLM, to accelerate the research process and achieve results more efficiently [1]. Group 1: Course Offerings - The program offers a 1v6 elite small class format, allowing for personalized guidance from a mentor throughout the research process [5]. - The course covers everything from model theory to practical coding, helping participants build their own knowledge systems and understand algorithm design and innovation in LLM/MLLM [1][10]. - Participants will receive tailored ideas from the mentor to kickstart their research, even if they lack a clear direction initially [7]. Group 2: Instructor Background - The instructor has a strong academic background, having graduated from a prestigious computer science university and worked as an algorithm researcher in various companies [2]. - The instructor's research includes computer vision, efficient model compression algorithms, and multimodal large language models, with a focus on lightweight models and efficient fine-tuning techniques [2][3]. Group 3: Target Audience - The program is suitable for graduate students and professionals in the fields of autonomous driving, AI, and those looking to enhance their algorithmic knowledge and research skills [11]. - It caters to individuals who need to publish papers for academic recognition or those who want to systematically master model compression and multimodal reasoning [11]. Group 4: Course Structure and Requirements - The course is designed to accommodate students with varying levels of foundational knowledge, with adjustments made to the depth of instruction based on participants' backgrounds [14]. - Participants are expected to have a basic understanding of deep learning and machine learning, familiarity with Python and PyTorch, and a willingness to engage actively in the learning process [16][19].
中美AI差距有多大,AI竞争焦点在哪?《全球人工智能科研态势报告》全球首发
Tai Mei Ti A P P· 2025-07-03 10:36
Core Insights - The report titled "Global AI Research Landscape Report (2015-2024)" analyzes the evolution of AI research over the past decade, highlighting the competitive landscape between China and the United States in AI talent and publication output [2][7]. Group 1: AI Research Trends - The report identifies four distinct phases in AI research: initial phase (2015-2016), rapid development phase (2017-2019), maturity peak phase (2020-2023), and adjustment phase (2024) [4][5]. - The number of AI papers published globally increased significantly, with a peak of 17,074 papers in 2023, representing nearly a fourfold increase from 2015 [5][6]. - The year 2024 is expected to see a decline in publication volume to 14,786 papers, indicating a shift towards more specialized and application-oriented research [6]. Group 2: Talent Distribution - China has emerged as the second-largest hub for AI talent, with a total of 52,000 researchers by 2024, growing at a compound annual growth rate of 28.7% since 2015 [8]. - The United States leads with over 63,000 AI researchers, with significant contributions from institutions like Stanford and MIT, as well as tech giants like Google and Microsoft [8][9]. - Chinese institutions such as the Chinese Academy of Sciences, Tsinghua University, and Peking University are leading in terms of publication output and talent concentration [7][9]. Group 3: Institutional and Corporate Performance - The Chinese Academy of Sciences published 4,639 top-tier papers, while Tsinghua University and Peking University followed closely, showcasing China's institutional strength in AI research [7][9]. - In contrast, U.S. companies like Google, Microsoft, and Meta have a significantly higher average publication output compared to their Chinese counterparts, reflecting a disparity in research investment and output capabilities [9][10]. - The top three U.S. companies published 5,896 papers, which is 1.8 times the output of the top three Chinese companies [9][10]. Group 4: Gender Disparity in AI Talent - The report highlights a significant gender imbalance in AI research, with women making up only 9.3% of AI talent in China compared to 20.1% in the U.S. [12][13]. - Chinese institutions like Tsinghua University and Peking University have low female representation in AI, at 7.88% and 9.18% respectively, compared to 25%-30% in top U.S. institutions [12][13]. Group 5: Future Trends in AI Research - The report indicates that "deep learning" has been the dominant focus in AI research over the past decade, but its growth rate is expected to slow down, suggesting a need for new approaches [14][15]. - Emerging technologies such as "Transformers" are gaining traction, particularly in natural language processing and multimodal AI, indicating a shift in research focus [15]. - The integration of traditional AI fields with deep learning techniques is becoming more prevalent, reflecting a trend towards collaborative and interdisciplinary research [15].
实验室10篇论文被ICCV 2025录用
自动驾驶之心· 2025-07-02 13:54
Core Insights - The article discusses the acceptance of 10 papers from a laboratory at the 20th ICCV International Conference on Computer Vision, highlighting advancements in 3D vision and related technologies [25]. Paper Summaries Paper 1: Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds - This paper addresses domain generalization in 3D scene segmentation, proposing a framework that couples geometric embedding with semantic learning to enhance model generalization [1]. Paper 2: Hierarchical Variational Test-Time Prompt Generation for Zero-Shot Generalization - The authors introduce a hierarchical variational method for dynamic prompt generation during inference, significantly improving the zero-shot generalization capabilities of visual language models [3]. Paper 3: Knowledge-Guided Part Segmentation - A new framework is proposed that utilizes structural knowledge to enhance the segmentation of fine-grained object parts, improving understanding of complex structures [5][6]. Paper 4: TopicGeo: An Efficient Unified Framework for Geolocation - TopicGeo presents a unified framework for geolocation that improves computational efficiency and accuracy by directly matching query images with reference images [9]. Paper 5: Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation - This paper explores a model that enhances the understanding of relationships in open-vocabulary scene graph generation through multimodal interaction learning [11]. Paper 6: VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding - The authors propose a mechanism that combines attribute and spatial information to improve the accuracy of 3D visual grounding tasks [13]. Paper 7: Meta-Learning Dynamic Center Distance: Hard Sample Mining for Learning with Noisy Labels - A new metric called Dynamic Center Distance is introduced to enhance the learning process in the presence of noisy labels by focusing on hard samples [15]. Paper 8: Learning Separable Fine-Grained Representation via Dendrogram Construction from Coarse Labels for Fine-grained Visual Recognition - The paper presents a method for learning fine-grained representations from coarse labels without predefined category numbers, enhancing adaptability to dynamic semantic structures [17]. Paper 9: Category-Specific Selective Feature Enhancement for Long-Tailed Multi-Label Image Classification - This research addresses the issue of label imbalance in multi-label image classification by enhancing feature sensitivity for underrepresented categories [19]. Paper 10: Partially Matching Submap Helps: Uncertainty Modeling and Propagation for Text to Point Cloud Localization - The authors redefine the task of text to point cloud localization by allowing partial spatial matches, improving the model's ability to handle real-world ambiguities [21].
极智嘉 全栈技术筑壁垒掘金仓储自动化黄金赛道
Sou Hu Cai Jing· 2025-07-02 09:30
Company Overview - Beijing Geek+ Technology Co., Ltd. (referred to as "Geek+") is launching its IPO from today until July 4, 2025, with plans to list on the Hong Kong Stock Exchange on July 9, 2025 [2] - The company plans to issue 140,353,000 H-shares, raising approximately HKD 2.358 billion at an issue price of HKD 16.80 per share [2] - Geek+ has attracted four cornerstone investors, collectively subscribing USD 91.3 million (approximately HKD 716.7 million) [2] Technology and Innovation - Geek+ has developed a comprehensive technology stack covering hardware, software, and algorithms, creating a significant technological moat [3] - The company introduced laser-vision fusion SLAM technology, achieving an average positioning accuracy of less than ±10mm, leading the industry [4] - The Hyper+ core algorithm platform is one of the most advanced in the AMR market, optimizing resource allocation and maximizing cost efficiency [5] - Geek+ has created the world's first universal robot technology platform, Robot Matrix, enhancing R&D efficiency by over 30% [6][7] - The company has filed over 2,000 patents by 2024, with its PopPick solution leading globally in compatibility and throughput efficiency [8] Market Landscape - The global AMR market is projected to grow from CNY 38.7 billion in 2024 to CNY 162.1 billion by 2029, with a CAGR of 33.1% [10] - The penetration rate of AMR in warehouse automation is expected to rise from 4.4% in 2020 to 20.2% in 2029 [10] - Key growth drivers include the booming e-commerce sector, increasing demand for logistics automation, and the need for manufacturing efficiency [13] - AMR robots have diverse applications across various industries, including logistics, manufacturing, healthcare, and food service [14] Competitive Advantages - Geek+ has established a global service network and collaborates with partners like Bosch Rexroth and Mujin, creating a complete ecosystem from hardware to systems [18] - The company has received strategic investments from firms like Warburg Pincus, Ant Group, and Intel, with net proceeds of approximately HKD 2.206 billion allocated for R&D and market expansion [19] - Geek+ maintains a leading market share in the AMR sector, with a revenue increase from CNY 790 million in 2021 to CNY 2.41 billion in 2024, reflecting a CAGR of 45% [23] - The company has a customer repurchase rate of 74.6%, indicating strong client retention and satisfaction [24] Industry Outlook - The intelligent logistics automation industry is experiencing rapid growth, with favorable policies supporting technological innovation and application promotion [15] - Advances in AI, machine learning, computer vision, and IoT are enhancing AMR robot performance and functionality [16] - The global labor shortage and the decline of China's demographic dividend are driving the shift towards automation, with Geek+ solutions reducing labor needs by 65% [17]
重磅直播!清华&博世开源SOTA性能纯血VLA:Impromptu-VLA告别双系统~
自动驾驶之心· 2025-07-01 12:58
Core Viewpoint - The article discusses the advancements and challenges in autonomous driving systems, particularly in unstructured environments, and introduces the Impromptu VLA framework developed by Tsinghua AIR and Bosch Research Institute to address data gaps in these scenarios [1]. Group 1: Advancements in Autonomous Driving - Current autonomous driving systems have made significant progress in structured environments like cities and highways, but face challenges in unstructured scenarios such as rural roads and construction zones [1]. - Existing large-scale autonomous driving datasets primarily focus on conventional traffic conditions, leading to a lack of specialized, large-scale, and finely annotated data for complex unstructured environments [1]. Group 2: Impromptu VLA Framework - The Impromptu VLA framework aims to provide an open-weight and open-data driving vision-language-action model, which is a fully end-to-end system that extracts multimodal features directly from driving video segments [1]. - Impromptu VLA generates driving commands in natural language format without the need for manually designed perception modules or intermediate representations [1]. - In the NeuroNCAP closed-loop safety evaluation system, Impromptu VLA demonstrates strong decision robustness and generalization capabilities, significantly outperforming the latest BridgeAD system proposed at CVPR 2025 (2.15 vs. 1.60) [1].
暑假打打比赛!PRCV 2025空间智能与具身智能视觉感知挑战赛正式启动~
自动驾驶之心· 2025-06-30 12:51
Core Viewpoint - The competition aims to advance research in spatial intelligence and embodied intelligence, focusing on visual perception as a key supporting technology for applications in autonomous driving, smart cities, and robotics [2][4]. Group 1: Competition Purpose and Significance - Visual perception is crucial for achieving spatial and embodied intelligence, with significant applications in various fields [2]. - The competition seeks to promote high-efficiency and high-quality research in spatial and embodied intelligence technologies [4]. - It aims to explore innovations in cutting-edge methods such as reinforcement learning, computer vision, and graphics [4]. Group 2: Competition Organization - The competition is organized by a team of experts from institutions like Beijing University of Science and Technology, Tsinghua University, and the Chinese Academy of Sciences [5]. - The competition is sponsored by Beijing Jiuzhang Yunjing Technology Co., Ltd., which also provides technical support [5]. Group 3: Competition Data and Resources - Participants will have access to real and simulated datasets, including multi-view drone aerial images and specific simulation environments for tasks [11]. - The sponsor will provide free computing resources, including H800 GPU power, for validating and testing submitted algorithms [12][13]. Group 4: Task Settings - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation methods [17]. - Spatial Intelligence requires building a 3D reconstruction model based on multi-view aerial images, while Embodied Intelligence involves completing tasks in dynamic occlusion scenarios [17]. Group 5: Evaluation Methods - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with scores based on PSNR and F1-Score metrics [19][20]. - For Embodied Intelligence, evaluation focuses on task completion and execution efficiency, with metrics such as success rate and average pose error [23][21]. Group 6: Submission and Awards - Results must be submitted in a specified format, and top-ranking teams will have their results reproduced for evaluation [24]. - Awards for each track include cash prizes and computing vouchers, with a total of 12 awards distributed among the top teams [25].
ICCV 2025放榜!录取率24%,夏威夷门票你抢到了吗?
机器之心· 2025-06-26 06:10
Core Insights - The ICCV 2025 conference will take place from October 19 to 25 in Hawaii, USA, with a significant increase in paper submissions reflecting the rapid expansion of the computer vision field [2][4]. - A total of 11,239 valid submissions were received this year, with 2,699 papers recommended for acceptance, resulting in an acceptance rate of 24% [3][5]. - The acceptance rate has remained relatively stable over the past few years, consistently hovering around 25% to 26% [5][8]. - The conference has implemented new policies to enhance accountability and integrity, identifying 25 irresponsible reviewers and rejecting 29 associated papers [6][7]. Submission Trends - The number of submissions for ICCV 2025 is nearly three times that of 2019, indicating a surge in academic activity within the computer vision domain [4]. - Previous years' submission data shows a steady increase: ICCV 2023 had 8,260 submissions with a 26.15% acceptance rate, ICCV 2021 had 6,152 submissions with a 26.20% acceptance rate, and ICCV 2019 had 4,323 submissions with a 25% acceptance rate [8]. Challenges in Peer Review - The rapid increase in submissions poses unprecedented challenges for the peer review process, with major AI conferences experiencing submission volumes exceeding 10,000 papers [35]. - Concerns regarding review quality and reviewer accountability have become more pronounced, leading to discussions about reforming the traditional one-way review system into a two-way feedback loop [38][39]. - A proposed solution includes a dual feedback system allowing authors to evaluate review quality while providing formal recognition to reviewers, aiming to create a sustainable and high-quality peer review system [38][40].