Workflow
计算机视觉
icon
Search documents
辽宁青年科学家论坛在沈举办
Liao Ning Ri Bao· 2025-11-24 01:04
本届论坛以"智创辽宁,AI赋能"为主题,特邀唐立新院士作题为《智能工业数据解析与优化》的报 告。省内高校与科研院所有关专家分别围绕人工智能赋能产业转型升级和前沿技术与创新应用,聚焦工 业智能、智慧能源、机器人、医工融合、大模型、计算机视觉等方面作报告。论坛建议强化工业智能根 基,推动传统产业升级;聚焦前沿技术应用,开辟新兴赛道;促进"产学研用"深度融合,构建创新生 态;夯实人才支撑体系,激发青年创新活力。 省内高校与科研院所、企业、学会代表,省政协青年委员代表,中国科协青年人才托举工程博士生 专项计划、培育工程工程师专项计划入选者代表等参加了论坛。 11月23日,第八届辽宁青年科学家论坛在沈阳举办,为青年科技人才搭建交流思想、激发智慧、共 促创新的高层次学术平台,进一步推动人工智能技术与传统产业深度融合,助力辽宁数字经济发展、智 造强省建设。 ...
AI视觉GPT时刻,Meta新模型一键“分割世界”,网友直呼太疯狂了
3 6 Ke· 2025-11-20 10:04
Core Insights - Meta has launched a new family of models called SAM 3D, which includes SAM 3D Objects for object and scene reconstruction and SAM 3D Body for human shape estimation [1][12] - The SAM 3D series allows users to extract 3D models from 2D images with high accuracy, enabling 360-degree rotation without noticeable flaws [1][11] - SAM 3 introduces a new feature called "promptable concept segmentation," enhancing the model's versatility in image segmentation tasks [1][19] SAM 3D Objects - SAM 3D Objects has achieved significant advancements in 3D object reconstruction, utilizing a data annotation engine that has labeled nearly one million images to generate over 3.14 million mesh models [7][9] - The model outperforms existing leading models in human preference tests with a 5:1 advantage, enabling near-real-time 3D applications [10][11] - SAM 3D Objects can reconstruct shapes, textures, and poses of objects, allowing users to manipulate the camera for different viewing angles [11][12] SAM 3D Body - SAM 3D Body focuses on human 3D reconstruction, accurately estimating human poses and shapes from single images, even in complex scenarios [12][13] - The model supports prompt inputs, allowing users to guide predictions through segmentation masks and key points, enhancing interactivity [12][13] - SAM 3D Body has been trained on approximately 8 million high-quality samples, ensuring robustness across diverse scenarios [13][16] SAM 3 Model Features - SAM 3 is a unified model capable of detecting, segmenting, and tracking objects based on text, example images, or visual prompts, significantly improving flexibility in segmentation tasks [18][19] - The model has shown a 100% improvement in concept segmentation performance on the SA-Co benchmark compared to previous models [19][20] - Meta has implemented a collaborative data engine involving both AI and human annotators to enhance data labeling efficiency and model performance [20][23] Conclusion - The rise of generative AI is transforming computer vision (CV) capabilities, expanding the boundaries of model training and data set creation [24] - Meta is actively applying these technologies in real business scenarios, suggesting that the SAM and SAM 3D series models may yield further innovations as data and user feedback accumulate [24]
七大“深度科技”将引领全球农业变革
Ke Ji Ri Bao· 2025-11-13 01:00
Core Insights - The global agriculture sector is at a critical juncture, facing unprecedented pressures from climate change, resource degradation, demographic shifts, and geopolitical instability, necessitating a systemic transformation led by "deep technology" [1] - Deep technology, which encompasses advanced scientific and engineering innovations, is expected to revolutionize the agricultural industry and address significant global challenges over the next decade [1] Group 1: Deep Technology in Agriculture - Deep technologies such as Generative AI, computer vision, edge IoT, satellite remote sensing, robotics, CRISPR gene editing, and nanotechnology are identified as key drivers for transforming global agriculture into a more resilient, sustainable, and efficient system [1] - The World Economic Forum's "AI in Agriculture Innovation Initiative" released a report highlighting the potential of these technologies to reshape agricultural practices [1] Group 2: Generative AI - Generative AI is leveraging advancements in large language models and the increasing availability of agricultural data, providing personalized crop management advice and localized farming plans [2] - Applications include acting as an "AI advisor" for farmers, assisting governments in macro crop planning, and accelerating the development of new crop varieties through gene editing [2] - The lack of high-quality training data, particularly for localized scenarios, remains a significant barrier to the widespread adoption of Generative AI in agriculture [2] Group 3: Computer Vision - Computer vision enables machines to interpret images and videos, generating decision-making suggestions and reducing reliance on human analysis [3] - In agriculture, it is used for precise identification of crop diseases, weeds, and pests, as well as real-time monitoring of crop growth [3] - The variability of field conditions and plant growth stages poses challenges for the large-scale application of computer vision technology in agriculture [3] Group 4: Edge IoT - Edge IoT processes data at the device level or nearby network edge, allowing for low-latency real-time responses and accelerating autonomous decision-making [4] - It is particularly beneficial in rural areas with weak network coverage, facilitating applications such as automated irrigation and early disease warning systems [4] - High equipment costs and interoperability issues between different edge systems are current challenges in this field [4] Group 5: Satellite Remote Sensing - Satellite remote sensing technology is increasingly applied in agriculture due to improved spatial and spectral resolution and higher data collection frequency [6] - It allows for efficient monitoring of large geographic areas at a low cost, assessing crop health and predicting pest outbreaks [6] - The precision of satellite remote sensing needs improvement when dealing with small-scale, dispersed farmland or multi-crop rotations [7] Group 6: Robotics - Robotics technology automates labor-intensive or complex tasks in agriculture, integrating perception and decision-making capabilities [8] - With advancements in AI perception and cloud-edge collaboration, agricultural robots can perform tasks such as precision planting and automated harvesting [8] - High costs of these technologies present challenges for adoption in countries with abundant low-wage labor [9] Group 7: CRISPR Technology - CRISPR gene editing is a key force in agricultural development, allowing precise modifications to DNA to enhance desirable traits in crops [10] - It aims to accelerate the breeding of crops that are drought-resistant, pest-resistant, and nutritionally enhanced [10] - Regulatory hurdles and public acceptance issues are significant challenges to the commercialization of CRISPR technology [11] Group 8: Nanotechnology - Nanotechnology shows potential in agriculture for pest control, nutrient management, and controlled release of agricultural inputs [12] - The lack of long-term data on environmental and health impacts poses challenges for the widespread application of nanotechnology [12] - The report suggests that governments and institutions should support promising agricultural deep tech projects through policy coordination, funding, talent development, and infrastructure building [12]
全球首个,Nature重磅研究:计算机视觉告别“偷数据”时代
3 6 Ke· 2025-11-06 08:13
Core Insights - The article discusses the launch of FHIBE, the world's first publicly available, globally diverse dataset based on user consent, aimed at assessing fairness in human-centric computer vision tasks [2][5][17] - FHIBE addresses ethical issues in data collection for AI, such as unauthorized use, lack of diversity, and social biases, which have been prevalent in existing datasets [2][6][17] Dataset Overview - FHIBE includes 10,318 images from 81 countries, representing 1,981 independent individuals, covering a wide range of visual tasks from facial recognition to visual question answering [2][6] - The dataset features comprehensive annotation information, including demographic characteristics, physical attributes, environmental factors, and pixel-level annotations, enabling detailed bias diagnostics [3][7] Ethical Considerations - The data collection process adhered to ethical standards, including compliance with GDPR, ensuring informed consent from participants regarding the use of their biometric data for AI fairness research [10][17] - Participants provided self-reported information such as age, pronouns, ancestry, and skin color, creating 1,234 cross-group combinations to enhance diversity [6][11] Methodological Rigor - FHIBE is designed specifically for bias assessment, ensuring it is used solely for measuring fairness rather than reinforcing biases [11][17] - The dataset allows for systematic testing of various mainstream models across eight computer vision tasks, revealing significant disparities in accuracy based on demographic factors [11][12] Findings and Implications - The research identified previously unrecognized biases, such as lower recognition accuracy for older individuals and women, highlighting the need for improved model performance across diverse demographics [13][15] - FHIBE serves as a pivotal tool for promoting responsible AI development and aims to pave the way for ethical data collection practices in the future [17][18]
南京大学、影石创新、栖霞区签订战略合作协议 影石智能影像算法创新中心揭牌
Nan Jing Ri Bao· 2025-11-05 02:01
Core Insights - A strategic cooperation agreement was signed on November 4 between Nanjing University, Yingshi Innovation, and Qixia District, leading to the establishment of the Yingshi Intelligent Imaging Algorithm Innovation Center [1] Group 1: Company Overview - Yingshi Innovation Technology Co., Ltd. is a global leader in intelligent imaging, focusing on the research, production, and sales of panoramic cameras, action cameras, and panoramic drones [1] Group 2: Strategic Collaboration - The collaboration aims to deepen the synergy between academia, local government, and industry, focusing on talent cultivation and technological innovation [1] - Yingshi will leverage Nanjing University's talent resources to establish the Yingshi Intelligent Imaging Algorithm Innovation Center, concentrating on AI imaging algorithms, VR/AR, and computer vision [1] - A talent cultivation base will be jointly built, facilitating internships, graduation projects, and entrepreneurship training to develop high-quality, application-oriented, and innovative talents [1] Group 3: Support and Development - Qixia District will support Yingshi Innovation in implementing demonstration applications in industrial manufacturing, intelligent meetings, and urban governance [1] - Collaboration will also extend to other universities in Nanjing and complementary technology enterprises for research and development, talent training, and practical applications [1]
南京大学、影石创新、栖霞区签订战略合作协议
Xin Lang Cai Jing· 2025-11-04 13:25
Core Insights - Nanjing University, Yingshi Innovation, and Qixia District signed a strategic cooperation agreement to establish the Yingshi Intelligent Imaging Algorithm Innovation Center [1] - The collaboration focuses on AI imaging algorithms, VR/AR, and computer vision technologies [1] - A talent cultivation base will be created to train high-quality application-oriented and innovative talents in line with industry needs [1] Group 1 - The Yingshi Intelligent Imaging Algorithm Innovation Center will leverage Nanjing University's talent resources and Qixia District's policy support [1] - The partnership aims to conduct internships, graduation projects, and entrepreneurship training [1] - Collaboration will extend to other universities and complementary technology enterprises for research and talent development [1] Group 2 - Qixia District will support Yingshi Innovation in demonstrating applications in industrial manufacturing, smart meetings, and urban governance [1]
A股计算机视觉第一股格灵深瞳业绩持续承压,前三季亏损过亿
Nan Fang Du Shi Bao· 2025-10-30 12:08
Core Viewpoint - Geling Deep Vision (688207.SH), known as the "first AI computer vision stock" on the Sci-Tech Innovation Board, reported a net loss attributable to shareholders of 47.49 million yuan for Q3 2025, indicating ongoing pressure on profitability despite a significant revenue increase [1][3]. Financial Performance - In Q3 2025, Geling Deep Vision's operating revenue reached 51.76 million yuan, a year-on-year increase of 453.28%. However, this revenue is not impressive when compared to the 70 million yuan range from 2021 to 2023, with a drastic drop to 9.35 million yuan in 2024 [1][3]. - For the first three quarters of 2025, the company reported a total net loss of 127 million yuan, a slight improvement from a loss of 138 million yuan in the same period of 2024 [1]. Cash Flow and Client Structure - The company's operating cash flow remains concerning, with a net outflow of 62.56 million yuan in Q3 2025. This trend of cash outflow has persisted since 2024 [3]. - Geling Deep Vision's financial situation is closely tied to its client structure, with a high concentration of clients in the smart finance and special fields. The company noted a slowdown in product demand due to tightened budgets from clients influenced by the macroeconomic environment [3][4]. Major Clients and Revenue Diversification - In 2024, the Agricultural Bank of China was the largest client, contributing 44.44% of the company's annual revenue. However, by the first three quarters of 2025, revenue from clients other than the Agricultural Bank accounted for nearly 90% of total revenue, indicating a push for business diversification [3][4]. Research and Development Focus - Geling Deep Vision is heavily investing in two major projects: multimodal large model technology and smart energy farms, with expected investments of 368 million yuan and 50.58 million yuan, respectively [4]. - The smart energy farm project aims to utilize AI and controlled photosynthesis technologies for efficient microalgae cultivation, which has raised concerns among investors about potential distractions from core business operations [5]. Workforce and Talent Management - The company has seen a significant reduction in its R&D personnel, decreasing from 318 in the first half of 2024 to 227 in the same period of 2025. The average salary for R&D staff also declined from 189,700 yuan to 178,900 yuan [5]. - Geling Deep Vision has warned that failure to retain key technical talent or attract new talent could lead to risks associated with talent shortages and loss of critical technology personnel [5].
今年CVPR,自动驾驶还能冲什么方向?
自动驾驶之心· 2025-10-28 00:03
Core Viewpoint - The article emphasizes the importance of targeted guidance and mentorship for students aiming to publish high-quality papers in top conferences like CVPR and ICLR, highlighting the need for strategic efforts in the final stages of the submission process [1][2][4]. Group 1: Submission Guidance - The article mentions that the majority of accepted papers in past conferences focus on localized breakthroughs and verifiable improvements, aligning closely with the main themes of the respective years [1]. - It suggests that the main theme for CVPR 2026 is likely to be "world models," indicating a strategic direction for potential submissions [1]. - The article encourages students to leverage the experiences of predecessors to enhance their submission quality, particularly in the final stages of preparation [2]. Group 2: Mentorship and Support - The organization, "Automated Driving Heart," is described as the largest AI technology media platform in China, with extensive academic resources and a deep understanding of the challenges in interdisciplinary fields like autonomous driving and robotics [3]. - The article highlights the success rate of their mentorship program, with a 96% acceptance rate for students over the past three years, indicating the effectiveness of their guidance [5]. - It outlines the personalized support provided, including assistance with research thinking, familiarization with research processes, and practical application of theoretical models [7][13]. Group 3: Program Structure and Offerings - The article details the structured support offered, including personalized paper guidance, real-time interaction with mentors, and unlimited access to recorded sessions for review [13]. - It specifies that the program caters to various academic levels and goals, from foundational courses for beginners to advanced mentorship for experienced researchers [17][19]. - The organization also provides opportunities for outstanding students, such as recommendations to prestigious institutions and direct referrals to leading tech companies [19].
汇报一下ICCV全部奖项,恭喜朱俊彦团队获最佳论文
具身智能之心· 2025-10-26 04:02
Core Insights - The article highlights the significant presence of Chinese authors at ICCV 2025, accounting for 50% of the submissions, showcasing China's growing influence in the field of computer vision [1]. Awards and Recognitions - The Best Paper Award (Marr Prize) was awarded to a study titled "Generating Physically Stable and Buildable Brick Structures from Text," which introduced BRICKGPT, a model that generates stable brick structures based on textual prompts [4][24]. - The Best Student Paper Award went to "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models," which presents a method for editing images without the need for inversion [6][38]. - Honorary mentions for Best Paper included "Spatially-Varying Autofocus," which innovatively allows cameras to focus on different depths simultaneously [7][42]. - Honorary mentions for Best Student Paper included "RayZer: A Self-supervised Large View Synthesis Model," which autonomously reconstructs camera parameters and generates new perspectives from uncalibrated images [9][47]. Notable Research Contributions - The BRICKGPT model was trained on a dataset of over 47,000 brick structures, demonstrating its ability to generate aesthetically pleasing and stable designs that can be assembled manually or by robotic arms [24][26]. - FlowEdit utilizes a differential equation to map source and target distributions directly, achieving advanced results without the need for model-specific dependencies [39][40]. - The "Fast R-CNN" method, awarded the Helmholtz Prize, significantly improved training and testing speeds while enhancing detection accuracy in object recognition tasks [10][54]. - The research on modified activation functions, which led to a new parameterized ReLU, achieved a top-5 test error of 4.94% on the ImageNet dataset, surpassing human-level performance [58][60]. Awarded Teams and Individuals - The SMPL Body Model Team developed a highly accurate 3D human model based on extensive data from 3D scans, enhancing compatibility with mainstream rendering pipelines [62][66]. - The VQA Team created a dataset for visual question answering, containing approximately 250,000 images and 7.6 million questions, facilitating deeper understanding and reasoning about image content [68][69]. - Distinguished researchers David Forsyth and Michal Irani received the Outstanding Researcher Award for their contributions to computer vision and machine learning [72][75]. - Rama Chellappa was honored with the Azriel Rosenfeld Lifetime Achievement Award for his extensive work in computer vision and pattern recognition [78].
刚刚,ICCV最佳论文出炉,朱俊彦团队用砖块积木摘得桂冠
具身智能之心· 2025-10-23 00:03
Core Insights - The article discusses the recent International Conference on Computer Vision (ICCV) held in Hawaii, highlighting the award-winning research papers and their contributions to the field of computer vision [2][5][24]. Group 1: Award Winners - The Best Paper Award was given to a research team from Carnegie Mellon University (CMU) for their paper titled "Generating Physically Stable and Buildable Brick Structures from Text," led by notable AI scholar Zhu Junyan [3][7][11]. - The Best Student Paper Award was awarded to a paper from the Technion, titled "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models," which introduces a novel image editing method [28][30]. Group 2: Conference Statistics - ICCV is one of the top three conferences in computer vision, held biennially, with this year's conference receiving 11,239 valid submissions and accepting 2,699 papers, resulting in a 24% acceptance rate, a significant increase from the previous conference [5]. Group 3: Research Contributions - The paper by CMU presents Brick GPT, the first method capable of generating physically stable and interconnected brick assembly models based on text prompts. The research includes a large dataset of over 47,000 brick structures and 28,000 unique 3D objects with detailed descriptions [11][13]. - The FlowEdit paper from Technion proposes a new image editing approach that bypasses the traditional image-to-noise inversion process, achieving higher fidelity edits by establishing a direct mapping path between source and target image distributions [32][34]. Group 4: Methodology and Results - The Brick GPT method utilizes a self-regressive large language model trained on a dataset of brick structures, incorporating validity checks and a physics-aware rollback mechanism to ensure stability in generated designs [13][19]. - Experimental results show that Brick GPT outperforms baseline models in terms of validity and stability, achieving a 100% validity rate and 98.8% stability in generated structures [20][22].