Workflow
计算机视觉
icon
Search documents
全球引才:Faster R-CNN、ResNet作者,中国科大任少卿,招募教授、学者和学生
机器之心· 2025-12-05 10:17
Core Viewpoint - The article highlights the achievements and contributions of Professor Ren Shaoqing in the field of artificial intelligence, particularly in deep learning and computer vision, emphasizing his role in advancing key technologies that impact various sectors such as autonomous driving and medical imaging [4][5][6]. Group 1: Academic Achievements - Professor Ren has made foundational and pioneering contributions in deep learning, computer vision, and intelligent driving, with his research serving as a core engine for critical areas of national economy and livelihood [5]. - His academic papers have been cited over 460,000 times, ranking him first among domestic scholars across all disciplines [5]. - He has received multiple prestigious awards, including the 2023 Future Science Prize in Mathematics and Computer Science and the 2025 NeurIPS Time Test Award [5]. Group 2: Key Research Contributions - The paper "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," awarded the NeurIPS 2025 Time Test Award, is considered a milestone in computer vision, having been cited over 98,000 times since its publication in 2015 [6]. - Faster R-CNN introduced a fully learnable two-stage pipeline that replaced traditional methods, achieving high precision and near real-time detection, significantly influencing the development of visual models over the past decade [6]. Group 3: Research Institute and Talent Recruitment - The General Artificial Intelligence Research Institute at the University of Science and Technology of China focuses on cutting-edge areas such as AI, world models, embodied intelligence, and autonomous driving, aiming for integrated innovation in research, talent cultivation, and industrial application [7]. - The institute is actively recruiting for various positions, including professors, researchers, postdoctoral fellows, engineers, and students at different academic levels, with a commitment to supporting high-level talent projects [9][10].
辽宁青年科学家论坛在沈举办
Liao Ning Ri Bao· 2025-11-24 01:04
Core Insights - The 8th Liaoning Youth Scientist Forum was held in Shenyang, focusing on the integration of artificial intelligence technology with traditional industries to promote the development of the digital economy in Liaoning and the construction of a strong intelligent province [1] Group 1: Forum Highlights - The theme of the forum was "Intelligent Creation in Liaoning, AI Empowerment" [1] - Academician Tang Lixin delivered a report titled "Intelligent Industrial Data Analysis and Optimization" [1] - Experts from local universities and research institutions presented on topics such as AI-enabled industrial transformation, cutting-edge technologies, and innovative applications, focusing on areas like industrial intelligence, smart energy, robotics, medical-engineering integration, large models, and computer vision [1] Group 2: Recommendations and Goals - The forum suggested strengthening the foundation of industrial intelligence to promote the upgrading of traditional industries [1] - It emphasized the importance of focusing on the application of cutting-edge technologies to open up new emerging sectors [1] - The forum advocated for deep integration of "industry, academia, research, and application" to build an innovative ecosystem and solidify the talent support system to stimulate youth innovation [1]
AI视觉GPT时刻,Meta新模型一键“分割世界”,网友直呼太疯狂了
3 6 Ke· 2025-11-20 10:04
Core Insights - Meta has launched a new family of models called SAM 3D, which includes SAM 3D Objects for object and scene reconstruction and SAM 3D Body for human shape estimation [1][12] - The SAM 3D series allows users to extract 3D models from 2D images with high accuracy, enabling 360-degree rotation without noticeable flaws [1][11] - SAM 3 introduces a new feature called "promptable concept segmentation," enhancing the model's versatility in image segmentation tasks [1][19] SAM 3D Objects - SAM 3D Objects has achieved significant advancements in 3D object reconstruction, utilizing a data annotation engine that has labeled nearly one million images to generate over 3.14 million mesh models [7][9] - The model outperforms existing leading models in human preference tests with a 5:1 advantage, enabling near-real-time 3D applications [10][11] - SAM 3D Objects can reconstruct shapes, textures, and poses of objects, allowing users to manipulate the camera for different viewing angles [11][12] SAM 3D Body - SAM 3D Body focuses on human 3D reconstruction, accurately estimating human poses and shapes from single images, even in complex scenarios [12][13] - The model supports prompt inputs, allowing users to guide predictions through segmentation masks and key points, enhancing interactivity [12][13] - SAM 3D Body has been trained on approximately 8 million high-quality samples, ensuring robustness across diverse scenarios [13][16] SAM 3 Model Features - SAM 3 is a unified model capable of detecting, segmenting, and tracking objects based on text, example images, or visual prompts, significantly improving flexibility in segmentation tasks [18][19] - The model has shown a 100% improvement in concept segmentation performance on the SA-Co benchmark compared to previous models [19][20] - Meta has implemented a collaborative data engine involving both AI and human annotators to enhance data labeling efficiency and model performance [20][23] Conclusion - The rise of generative AI is transforming computer vision (CV) capabilities, expanding the boundaries of model training and data set creation [24] - Meta is actively applying these technologies in real business scenarios, suggesting that the SAM and SAM 3D series models may yield further innovations as data and user feedback accumulate [24]
七大“深度科技”将引领全球农业变革
Ke Ji Ri Bao· 2025-11-13 01:00
Core Insights - The global agriculture sector is at a critical juncture, facing unprecedented pressures from climate change, resource degradation, demographic shifts, and geopolitical instability, necessitating a systemic transformation led by "deep technology" [1] - Deep technology, which encompasses advanced scientific and engineering innovations, is expected to revolutionize the agricultural industry and address significant global challenges over the next decade [1] Group 1: Deep Technology in Agriculture - Deep technologies such as Generative AI, computer vision, edge IoT, satellite remote sensing, robotics, CRISPR gene editing, and nanotechnology are identified as key drivers for transforming global agriculture into a more resilient, sustainable, and efficient system [1] - The World Economic Forum's "AI in Agriculture Innovation Initiative" released a report highlighting the potential of these technologies to reshape agricultural practices [1] Group 2: Generative AI - Generative AI is leveraging advancements in large language models and the increasing availability of agricultural data, providing personalized crop management advice and localized farming plans [2] - Applications include acting as an "AI advisor" for farmers, assisting governments in macro crop planning, and accelerating the development of new crop varieties through gene editing [2] - The lack of high-quality training data, particularly for localized scenarios, remains a significant barrier to the widespread adoption of Generative AI in agriculture [2] Group 3: Computer Vision - Computer vision enables machines to interpret images and videos, generating decision-making suggestions and reducing reliance on human analysis [3] - In agriculture, it is used for precise identification of crop diseases, weeds, and pests, as well as real-time monitoring of crop growth [3] - The variability of field conditions and plant growth stages poses challenges for the large-scale application of computer vision technology in agriculture [3] Group 4: Edge IoT - Edge IoT processes data at the device level or nearby network edge, allowing for low-latency real-time responses and accelerating autonomous decision-making [4] - It is particularly beneficial in rural areas with weak network coverage, facilitating applications such as automated irrigation and early disease warning systems [4] - High equipment costs and interoperability issues between different edge systems are current challenges in this field [4] Group 5: Satellite Remote Sensing - Satellite remote sensing technology is increasingly applied in agriculture due to improved spatial and spectral resolution and higher data collection frequency [6] - It allows for efficient monitoring of large geographic areas at a low cost, assessing crop health and predicting pest outbreaks [6] - The precision of satellite remote sensing needs improvement when dealing with small-scale, dispersed farmland or multi-crop rotations [7] Group 6: Robotics - Robotics technology automates labor-intensive or complex tasks in agriculture, integrating perception and decision-making capabilities [8] - With advancements in AI perception and cloud-edge collaboration, agricultural robots can perform tasks such as precision planting and automated harvesting [8] - High costs of these technologies present challenges for adoption in countries with abundant low-wage labor [9] Group 7: CRISPR Technology - CRISPR gene editing is a key force in agricultural development, allowing precise modifications to DNA to enhance desirable traits in crops [10] - It aims to accelerate the breeding of crops that are drought-resistant, pest-resistant, and nutritionally enhanced [10] - Regulatory hurdles and public acceptance issues are significant challenges to the commercialization of CRISPR technology [11] Group 8: Nanotechnology - Nanotechnology shows potential in agriculture for pest control, nutrient management, and controlled release of agricultural inputs [12] - The lack of long-term data on environmental and health impacts poses challenges for the widespread application of nanotechnology [12] - The report suggests that governments and institutions should support promising agricultural deep tech projects through policy coordination, funding, talent development, and infrastructure building [12]
全球首个,Nature重磅研究:计算机视觉告别“偷数据”时代
3 6 Ke· 2025-11-06 08:13
Core Insights - The article discusses the launch of FHIBE, the world's first publicly available, globally diverse dataset based on user consent, aimed at assessing fairness in human-centric computer vision tasks [2][5][17] - FHIBE addresses ethical issues in data collection for AI, such as unauthorized use, lack of diversity, and social biases, which have been prevalent in existing datasets [2][6][17] Dataset Overview - FHIBE includes 10,318 images from 81 countries, representing 1,981 independent individuals, covering a wide range of visual tasks from facial recognition to visual question answering [2][6] - The dataset features comprehensive annotation information, including demographic characteristics, physical attributes, environmental factors, and pixel-level annotations, enabling detailed bias diagnostics [3][7] Ethical Considerations - The data collection process adhered to ethical standards, including compliance with GDPR, ensuring informed consent from participants regarding the use of their biometric data for AI fairness research [10][17] - Participants provided self-reported information such as age, pronouns, ancestry, and skin color, creating 1,234 cross-group combinations to enhance diversity [6][11] Methodological Rigor - FHIBE is designed specifically for bias assessment, ensuring it is used solely for measuring fairness rather than reinforcing biases [11][17] - The dataset allows for systematic testing of various mainstream models across eight computer vision tasks, revealing significant disparities in accuracy based on demographic factors [11][12] Findings and Implications - The research identified previously unrecognized biases, such as lower recognition accuracy for older individuals and women, highlighting the need for improved model performance across diverse demographics [13][15] - FHIBE serves as a pivotal tool for promoting responsible AI development and aims to pave the way for ethical data collection practices in the future [17][18]
南京大学、影石创新、栖霞区签订战略合作协议 影石智能影像算法创新中心揭牌
Nan Jing Ri Bao· 2025-11-05 02:01
Core Insights - A strategic cooperation agreement was signed on November 4 between Nanjing University, Yingshi Innovation, and Qixia District, leading to the establishment of the Yingshi Intelligent Imaging Algorithm Innovation Center [1] Group 1: Company Overview - Yingshi Innovation Technology Co., Ltd. is a global leader in intelligent imaging, focusing on the research, production, and sales of panoramic cameras, action cameras, and panoramic drones [1] Group 2: Strategic Collaboration - The collaboration aims to deepen the synergy between academia, local government, and industry, focusing on talent cultivation and technological innovation [1] - Yingshi will leverage Nanjing University's talent resources to establish the Yingshi Intelligent Imaging Algorithm Innovation Center, concentrating on AI imaging algorithms, VR/AR, and computer vision [1] - A talent cultivation base will be jointly built, facilitating internships, graduation projects, and entrepreneurship training to develop high-quality, application-oriented, and innovative talents [1] Group 3: Support and Development - Qixia District will support Yingshi Innovation in implementing demonstration applications in industrial manufacturing, intelligent meetings, and urban governance [1] - Collaboration will also extend to other universities in Nanjing and complementary technology enterprises for research and development, talent training, and practical applications [1]
南京大学、影石创新、栖霞区签订战略合作协议
Xin Lang Cai Jing· 2025-11-04 13:25
Core Insights - Nanjing University, Yingshi Innovation, and Qixia District signed a strategic cooperation agreement to establish the Yingshi Intelligent Imaging Algorithm Innovation Center [1] - The collaboration focuses on AI imaging algorithms, VR/AR, and computer vision technologies [1] - A talent cultivation base will be created to train high-quality application-oriented and innovative talents in line with industry needs [1] Group 1 - The Yingshi Intelligent Imaging Algorithm Innovation Center will leverage Nanjing University's talent resources and Qixia District's policy support [1] - The partnership aims to conduct internships, graduation projects, and entrepreneurship training [1] - Collaboration will extend to other universities and complementary technology enterprises for research and talent development [1] Group 2 - Qixia District will support Yingshi Innovation in demonstrating applications in industrial manufacturing, smart meetings, and urban governance [1]
A股计算机视觉第一股格灵深瞳业绩持续承压,前三季亏损过亿
Nan Fang Du Shi Bao· 2025-10-30 12:08
Core Viewpoint - Geling Deep Vision (688207.SH), known as the "first AI computer vision stock" on the Sci-Tech Innovation Board, reported a net loss attributable to shareholders of 47.49 million yuan for Q3 2025, indicating ongoing pressure on profitability despite a significant revenue increase [1][3]. Financial Performance - In Q3 2025, Geling Deep Vision's operating revenue reached 51.76 million yuan, a year-on-year increase of 453.28%. However, this revenue is not impressive when compared to the 70 million yuan range from 2021 to 2023, with a drastic drop to 9.35 million yuan in 2024 [1][3]. - For the first three quarters of 2025, the company reported a total net loss of 127 million yuan, a slight improvement from a loss of 138 million yuan in the same period of 2024 [1]. Cash Flow and Client Structure - The company's operating cash flow remains concerning, with a net outflow of 62.56 million yuan in Q3 2025. This trend of cash outflow has persisted since 2024 [3]. - Geling Deep Vision's financial situation is closely tied to its client structure, with a high concentration of clients in the smart finance and special fields. The company noted a slowdown in product demand due to tightened budgets from clients influenced by the macroeconomic environment [3][4]. Major Clients and Revenue Diversification - In 2024, the Agricultural Bank of China was the largest client, contributing 44.44% of the company's annual revenue. However, by the first three quarters of 2025, revenue from clients other than the Agricultural Bank accounted for nearly 90% of total revenue, indicating a push for business diversification [3][4]. Research and Development Focus - Geling Deep Vision is heavily investing in two major projects: multimodal large model technology and smart energy farms, with expected investments of 368 million yuan and 50.58 million yuan, respectively [4]. - The smart energy farm project aims to utilize AI and controlled photosynthesis technologies for efficient microalgae cultivation, which has raised concerns among investors about potential distractions from core business operations [5]. Workforce and Talent Management - The company has seen a significant reduction in its R&D personnel, decreasing from 318 in the first half of 2024 to 227 in the same period of 2025. The average salary for R&D staff also declined from 189,700 yuan to 178,900 yuan [5]. - Geling Deep Vision has warned that failure to retain key technical talent or attract new talent could lead to risks associated with talent shortages and loss of critical technology personnel [5].
今年CVPR,自动驾驶还能冲什么方向?
自动驾驶之心· 2025-10-28 00:03
Core Viewpoint - The article emphasizes the importance of targeted guidance and mentorship for students aiming to publish high-quality papers in top conferences like CVPR and ICLR, highlighting the need for strategic efforts in the final stages of the submission process [1][2][4]. Group 1: Submission Guidance - The article mentions that the majority of accepted papers in past conferences focus on localized breakthroughs and verifiable improvements, aligning closely with the main themes of the respective years [1]. - It suggests that the main theme for CVPR 2026 is likely to be "world models," indicating a strategic direction for potential submissions [1]. - The article encourages students to leverage the experiences of predecessors to enhance their submission quality, particularly in the final stages of preparation [2]. Group 2: Mentorship and Support - The organization, "Automated Driving Heart," is described as the largest AI technology media platform in China, with extensive academic resources and a deep understanding of the challenges in interdisciplinary fields like autonomous driving and robotics [3]. - The article highlights the success rate of their mentorship program, with a 96% acceptance rate for students over the past three years, indicating the effectiveness of their guidance [5]. - It outlines the personalized support provided, including assistance with research thinking, familiarization with research processes, and practical application of theoretical models [7][13]. Group 3: Program Structure and Offerings - The article details the structured support offered, including personalized paper guidance, real-time interaction with mentors, and unlimited access to recorded sessions for review [13]. - It specifies that the program caters to various academic levels and goals, from foundational courses for beginners to advanced mentorship for experienced researchers [17][19]. - The organization also provides opportunities for outstanding students, such as recommendations to prestigious institutions and direct referrals to leading tech companies [19].
汇报一下ICCV全部奖项,恭喜朱俊彦团队获最佳论文
具身智能之心· 2025-10-26 04:02
Core Insights - The article highlights the significant presence of Chinese authors at ICCV 2025, accounting for 50% of the submissions, showcasing China's growing influence in the field of computer vision [1]. Awards and Recognitions - The Best Paper Award (Marr Prize) was awarded to a study titled "Generating Physically Stable and Buildable Brick Structures from Text," which introduced BRICKGPT, a model that generates stable brick structures based on textual prompts [4][24]. - The Best Student Paper Award went to "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models," which presents a method for editing images without the need for inversion [6][38]. - Honorary mentions for Best Paper included "Spatially-Varying Autofocus," which innovatively allows cameras to focus on different depths simultaneously [7][42]. - Honorary mentions for Best Student Paper included "RayZer: A Self-supervised Large View Synthesis Model," which autonomously reconstructs camera parameters and generates new perspectives from uncalibrated images [9][47]. Notable Research Contributions - The BRICKGPT model was trained on a dataset of over 47,000 brick structures, demonstrating its ability to generate aesthetically pleasing and stable designs that can be assembled manually or by robotic arms [24][26]. - FlowEdit utilizes a differential equation to map source and target distributions directly, achieving advanced results without the need for model-specific dependencies [39][40]. - The "Fast R-CNN" method, awarded the Helmholtz Prize, significantly improved training and testing speeds while enhancing detection accuracy in object recognition tasks [10][54]. - The research on modified activation functions, which led to a new parameterized ReLU, achieved a top-5 test error of 4.94% on the ImageNet dataset, surpassing human-level performance [58][60]. Awarded Teams and Individuals - The SMPL Body Model Team developed a highly accurate 3D human model based on extensive data from 3D scans, enhancing compatibility with mainstream rendering pipelines [62][66]. - The VQA Team created a dataset for visual question answering, containing approximately 250,000 images and 7.6 million questions, facilitating deeper understanding and reasoning about image content [68][69]. - Distinguished researchers David Forsyth and Michal Irani received the Outstanding Researcher Award for their contributions to computer vision and machine learning [72][75]. - Rama Chellappa was honored with the Azriel Rosenfeld Lifetime Achievement Award for his extensive work in computer vision and pattern recognition [78].