计算机视觉
Search documents
汇报一下ICCV全部奖项,恭喜朱俊彦团队获最佳论文
量子位· 2025-10-22 05:48
Core Points - The ICCV 2025 conference in Hawaii highlighted significant contributions from Chinese researchers, who accounted for 50% of the paper submissions [1] - Various prestigious awards were announced, showcasing advancements in computer vision research [3] Award Highlights - Best Paper Award (Marr Prize): "Generating Physically Stable and Buildable Brick Structures from Text" introduced BRICKGPT, a model that generates stable brick structures based on text prompts, utilizing a dataset of over 47,000 structures [4][24][26] - Best Student Paper Award: "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models" proposed a method for image editing without inversion, achieving state-of-the-art results [6][39][40] - Best Paper Honorary Mention: "Spatially-Varying Autofocus" developed a technique for dynamic depth adjustment in imaging, enhancing focus clarity across scenes [7][42][44] - Best Student Paper Honorary Mention: "RayZer: A Self-supervised Large View Synthesis Model" demonstrated 3D perception capabilities using uncalibrated images [9][47][49] Special Awards - Helmholtz Prize: Awarded to "Fast R-CNN" for its efficient object detection capabilities, significantly improving training and testing speeds [10][52][54] - Another Helmholtz Prize was given for research on rectified activation functions, achieving performance surpassing human-level accuracy on ImageNet [10][59][60] - Evelyn Erham Award: Recognized teams for their contributions to 3D modeling and visual question answering [12][63][68] - Distinguished Researcher Award: David Forsyth and Michal Irani were honored for their impactful work in computer vision [14][73][76] - Azriel Rosenfeld Lifetime Achievement Award: Rama Chellappa was recognized for his extensive contributions to the field [16][79] Research Contributions - The BRICKGPT model was developed to generate physically stable structures, utilizing a large dataset and innovative mechanisms for stability [24][26] - FlowEdit's approach allows for seamless image editing across different model architectures, enhancing flexibility in applications [39][40] - The spatially-varying autofocus technique improves image clarity by dynamically adjusting focus based on scene depth [42][44] - RayZer's self-supervised learning approach enables 3D scene reconstruction without the need for calibrated camera data [47][49] Conclusion - The ICCV 2025 conference showcased groundbreaking research and innovations in computer vision, with significant contributions from various teams and individuals, particularly highlighting the achievements of Chinese researchers [1][3]
CVPR 2026新规:强制披露算力成本,高效率、高透明度论文可获三项认可奖
机器之心· 2025-10-22 03:30
Core Viewpoint - The CVPR has introduced a pilot program called the "Compute Resource Report Form (CRF)" to enhance transparency and fairness in AI research by requiring authors to report the computational resources used in their studies starting from CVPR2026 [2][4]. Group 1: CRF Implementation - All paper submitters must provide a detailed report of the computational resources used, including GPU and CPU usage, training time, and model efficiency [7]. - Submission of the CRF is mandatory but will not affect the paper acceptance decision, as the data will be reviewed by an independent committee [6][16]. - The CRF requires basic information about hardware, computation time, and performance results compared to the strongest benchmarks [7][31]. Group 2: Recognition Awards - Papers demonstrating outstanding computational efficiency and/or transparency may qualify for recognition awards, including the Efficient CVPR Badge, CVPR Compute Star Award, and CVPR Compute Transparency Champion Award [9][10]. - Awards will be evaluated based on objective metrics and will ensure fair assessment across similar task categories [10][27]. Group 3: Submission Process - Authors are encouraged to review a pre-filled example form to understand how to complete each section of the CRF [11][19]. - The completion of the mandatory sections typically takes 10-15 minutes, while optional sections may require additional time [28]. - Authors should save the original PDF of the filled form without flattening it to retain the necessary data fields for processing [12][20]. Group 4: FAQs and Clarifications - The CRF aims to provide insights into the actual computational costs of different methods and is a transparency experiment rather than a judgment mechanism [15][18]. - High computational resource usage will not incur penalties, as significant advancements often require substantial resources [17]. - The submission of anonymized Weights & Biases logs is optional but can enhance the chances of receiving recognition awards [26].
刚刚,ICCV最佳论文出炉,朱俊彦团队用砖块积木摘得桂冠
机器之心· 2025-10-22 03:30
Core Insights - The ICCV (International Conference on Computer Vision) awarded the best paper and best student paper on October 22, 2023, highlighting significant advancements in computer vision research [1][2][4]. Group 1: Best Paper - The best paper award was given to a research team from Carnegie Mellon University (CMU) for their paper titled "Generating Physically Stable and Buildable Brick Structures from Text" led by notable AI scholar Junyan Zhu [6][9]. - The paper introduces BrickGPT, a novel method that generates physically stable and interconnected brick assembly models based on text prompts, marking a significant advancement in the field [9][11]. - The research team created a large-scale dataset of stable brick structures, comprising over 47,000 models and 28,000 unique 3D objects with detailed text descriptions, to train their model [11][10]. Group 2: Methodology and Results - The methodology involves discretizing a brick structure into a sequence of text tokens and training a large language model to predict the next brick to add, ensuring physical stability through validity checks and a rollback mechanism [10][17]. - Experimental results indicate that BrickGPT achieved a 100% validity rate and 98.8% stability rate, outperforming various baseline models in both effectiveness and stability [20][18]. - The paper's approach allows for the generation of diverse and aesthetically pleasing brick structures that align closely with the input text prompts, demonstrating high fidelity in design [11][20]. Group 3: Best Student Paper - The best student paper award went to a research from the Technion titled "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models," which innovatively bypasses traditional image editing paths to enhance image fidelity [25][28]. - FlowEdit establishes a direct mapping path between source and target image distributions, resulting in lower transfer costs and better preservation of original image structure during editing [31][27]. - The method was validated on advanced T2I flow models, achieving state-of-the-art results across various complex editing tasks, showcasing its efficiency and superiority [31][31]. Group 4: Other Awards and Recognitions - The Helmholtz Prize was awarded for contributions to computer vision benchmarks, recognizing two significant papers, including "Fast R-CNN" by Ross Girshick, which improved detection speed and accuracy [36][38]. - The Everingham Prize recognized teams for their contributions to 3D modeling and multimodal AI, including the development of the SMPL model and the VQA dataset [41][43]. - Significant Researcher Awards were given to David Forsyth and Michal Irani for their impactful contributions to the field of computer vision [50][52].
可实时预警岩体微小变化!深大团队研发地质灾害防治系统
Nan Fang Du Shi Bao· 2025-10-21 07:57
Core Viewpoint - The research team led by Professor Huang Hui from Shenzhen University has developed a new generation of intelligent monitoring system for geological disasters, which integrates computer vision, deep learning, and cloud-edge-end collaborative technology, transforming traditional point-based monitoring into comprehensive and intelligent monitoring [1][3]. Group 1: Traditional Monitoring Limitations - Traditional geological disaster monitoring methods rely heavily on embedded sensors and manual inspections, which have significant limitations [3]. - Sensors can only monitor preset points and cannot cover entire risk areas, while manual inspections are constrained by weather and terrain, making many dangerous areas inaccessible [3]. Group 2: Technological Innovations - The team proposed a core graphic information "cloud-edge-end" collaborative processing technology, achieving a transition from point monitoring to comprehensive prevention [3]. - The system utilizes a combination of computer graphics, computer vision, and deep learning, with breakthroughs in three key technical areas: effective capture of abnormal movements in monitored areas, over 85% accuracy in identifying rockfall events, and high-precision measurement of target displacement [3]. Group 3: Application and Impact - The system has demonstrated its application value in various scenarios, including 24-hour monitoring of tunnel entrances and high slope sections on mountain roads, rockfall disaster warnings for railways, stability monitoring in open-pit mining, and ensuring the safety of slopes in water conservancy projects [5]. - It has been implemented in Shenzhen's Jiangangshan Park, providing continuous monitoring and alarm for dangerous rocks and rockfalls [5]. - The monitoring device is equipped with a large-capacity solar power system for uninterrupted operation, showcasing strong environmental adaptability and energy self-sufficiency [5]. - The system captures minute changes in rock formations using high-resolution cameras and analyzes data in real-time with built-in intelligent algorithms, triggering multi-level alerts and uploading data to a cloud management platform via 4G/5G networks [5]. - This technology marks a shift from passive waiting to proactive prediction in geological disaster monitoring and early warning, entering a new phase of "full-domain perception, intelligent deduction, and precise warning" [5].
苹果盯上Prompt AI,不是买产品,是要伯克利团队的“视觉大脑”
3 6 Ke· 2025-10-14 00:59
Core Insights - Apple is in the final stages of negotiations to acquire Prompt AI, a startup in the computer vision field, marking a significant move in its AI strategy since the $3 billion acquisition of Beats Electronics in 2014 [1][3] - The acquisition focuses on Prompt AI's core technology and team, aligning with the current trend in Silicon Valley of "acquihire" [3] Acquisition Details - Prompt AI has a small team of 11 employees and was founded in 2023, having raised $5 million in seed funding [4][5] - The company's flagship product, Seemour, is a key focus for Apple, although its commercialization has faced challenges, leading to plans to take the app offline [7][5] - Employees not joining Apple will receive compensation and can apply for other positions within the company, indicating a cautious approach to the acquisition [7] Technology Fit - Seemour's capabilities align well with Apple's HomeKit ecosystem, addressing its shortcomings in home security [8] - The app can accurately identify people and objects, generate scene descriptions, and protect user privacy by processing data locally [8] - The underlying technology of Seemour could support other Apple initiatives, such as AI smart glasses and autonomous driving [8] Acquisition Strategy - Apple's acquisition strategy has historically favored small, focused teams rather than large-scale mergers, allowing for quick integration of technology and talent [9][10] - The company has made several smaller acquisitions to enhance its core products, contrasting with competitors like Meta and Google, which pursue larger, more integrated deals [10][14] Industry Context - The acquisition reflects a broader trend in Silicon Valley where tech giants are increasingly opting for talent acquisition to bolster their AI capabilities [11][14] - Apple's approach emphasizes precise adaptation to its existing ecosystem rather than broad coverage, focusing on specific needs in areas like HomeKit, AR, and autonomous driving [14][15]
ImageNet作者苏昊被曝任教复旦
量子位· 2025-10-10 03:52
Core Viewpoint - The article discusses the potential appointment of Hao Su, a prominent figure in embodied intelligence and computer vision, to Fudan University, highlighting his significant contributions to the field and his entrepreneurial ventures in robotics and simulation [1][49][51]. Group 1: Hao Su's Academic and Research Background - Hao Su is an associate professor at the University of California, San Diego (UCSD), specializing in computer vision, graphics, embodied intelligence, and robotics [14][49]. - He was involved in the creation of ImageNet and has led foundational projects such as ShapeNet, PointNet, and SAPIEN, which have significantly advanced the fields of 2D and 3D vision [4][30][34]. - Su's research has evolved from natural language processing to computer vision and then to 3D vision, culminating in the development of large-scale datasets and models that have transformed the landscape of artificial intelligence [22][30][34]. Group 2: Contributions to Robotics and Simulation - In 2020, Su launched SAPIEN, the first simulator focused on generalizable robotic operations, and later developed the ManiSkill platform for training robotic skills [35][41]. - His company, Hillbot, co-founded in 2024, aims to leverage high-fidelity simulation for robotics, with products like Hillbot Alpha designed for complex environments [43][45]. - Hillbot has partnered with Nvidia to generate high-quality training data, indicating a strong focus on enhancing robotic capabilities through advanced simulation techniques [47]. Group 3: Potential Move to Fudan University - There are rumors that Su will join Fudan University, which may invest in his company Hillbot and potentially appoint him to dual roles at various research institutes [51][52]. - Fudan University has established a credible embodied intelligence research institute, offering competitive salaries and performance-based incentives, which could attract top talent like Su [55][57].
算法小垃圾跳槽日记 2024&2025版
自动驾驶之心· 2025-10-06 04:05
Core Insights - The article discusses the author's experience in job searching and interviews, highlighting the challenges and changes in the job market, particularly in the computer vision (CV) and deep learning sectors [4][6][8]. Job Search Experience - The author experienced a high volume of interviews, averaging six per day over a month, with some days reaching eight interviews, indicating a competitive job market [4][5]. - The author transitioned from a role in a delivery company focused on CV to seeking opportunities in more stable and specialized areas, reflecting a shift in personal career focus [6][8]. Market Trends - There has been a significant increase in job opportunities compared to previous years, with many large and mid-sized companies actively hiring [8]. - The demand for traditional CV roles has diminished, with a notable shift towards large models, multi-modal applications, and end-to-end models in the autonomous driving sector [8][10]. Interview Preparation - The author prepared for interviews by reviewing popular coding problems, particularly from LeetCode, indicating a trend where companies now require candidates to demonstrate coding skills more rigorously than in the past [9][10]. - The author noted that many interview questions were derived from the "Hot100" list of coding problems, emphasizing the importance of algorithmic knowledge in technical interviews [11]. Career Transition - After several interviews, the author received offers from companies like Kuaishou, Xiaomi, and Weibo, but faced challenges in securing positions at larger firms like Alibaba and Baidu [10]. - Ultimately, the author accepted a position at a foreign company, which was described as a significantly better work environment compared to previous domestic companies, highlighting the differences in corporate culture [10][12]. Technical Skills and Trends - The author observed a shift in technical skills required in the job market, with a growing emphasis on large models and multi-modal technologies, suggesting that professionals in the field need to adapt to these changes to remain competitive [13].
刚毕业的AI博士,滞销了
投资界· 2025-09-28 07:35
Core Viewpoint - The article highlights the stark contrast in the job market for AI PhDs, where top-tier talent is highly sought after and rewarded, while the majority of average AI PhDs struggle to find suitable employment opportunities, leading to a polarized job landscape [5][10]. Group 1: Job Market Dynamics - The job market for AI PhDs is characterized by a significant divide, with top-tier candidates receiving lucrative offers, while average candidates face rejection and limited opportunities [5][10]. - Companies are increasingly selective, with hiring ratios for desirable positions often exceeding 10:1, and in some cases, as high as 200:1 for certain roles [8][9]. - Many average AI PhDs find themselves in a "talent pool," waiting for opportunities that may never materialize, as they lack the necessary credentials or connections to secure positions [9][10]. Group 2: Recruitment Challenges - Average AI PhDs often struggle to meet the high expectations set by companies, which seek candidates with extensive publication records and relevant experience [12][21]. - The pressure to publish is immense, with some candidates feeling compelled to produce papers that may lack genuine innovation just to meet job application requirements [13][18]. - The recruitment process is lengthy and often results in disappointment, as candidates face long waiting periods only to receive rejection notices [7][8]. Group 3: The Role of Networking - Networking plays a crucial role in securing job opportunities, with many positions being filled through personal connections rather than solely based on qualifications [21][22]. - Companies often prefer candidates who come recommended by trusted sources, such as former professors or industry contacts, which can disadvantage those without such connections [21][22]. - The reliance on networking highlights the importance of building relationships within the industry, as many job openings are not publicly advertised [21][22]. Group 4: Industry Trends and Expectations - The AI industry is rapidly evolving, with a strong focus on commercial applications and the development of general-purpose models, which may not align with the specialized research backgrounds of many PhDs [18][20]. - Companies are increasingly looking for candidates who can contribute to immediate business needs rather than those with niche expertise, leading to a mismatch between academic training and industry requirements [19][20]. - The competitive landscape for AI talent is intensifying, with top companies offering attractive packages to lure the best candidates, further widening the gap for average PhDs [10][11].
美睫机器人:嫁接睫毛又快又好
Ke Ji Ri Bao· 2025-09-18 00:17
(文章来源:科技日报) 如今,很多爱美人士都会定期进行睫毛嫁接。以往人工嫁接睫毛需要两个多小时,而现在最新美睫机器 人能提供更高效的睫毛嫁接服务。 这款机器人集成了计算机视觉与人工智能等前沿技术,将原本长达两个多小时的睫毛嫁接过程缩短至20 分钟。美睫师先清洁客户睫毛,去除灰尘与油脂以保证嫁接持久度,接着给客户贴上带引导条码的眼贴 膜,为机器人定位提供参考。准备就绪后,美睫机器人开始操作。借助计算机视觉技术,它能够精准扫 描每位用户的眼部轮廓,还可以根据面部肌肉微动作进行实时动态调整,从而为用户定制独一无二的美 睫方案。同时,它采用的AI算法可将睫毛嫁接位置的误差控制在10微米以内。 ...
苹果首款智能眼镜聚焦无屏设计 预计12至16个月内推出
Huan Qiu Wang Zi Xun· 2025-09-15 04:20
事实上,关于苹果智能眼镜的发布时间,此前已有郭明錤(Ming-Chi Kuo)等分析师提出可能在2026年 或2027年推出,而古尔曼是这一时间线的坚定支持者之一。他曾多次报道,苹果智能眼镜大概率于2026 年发布,同时也不排除因技术优化等因素推迟至2027年的可能性。 来源:环球网 【环球网科技综合报道】9月15日消息,据Applelnsider报道,彭博分析师马克・古尔曼近日在一档播客 节目中对苹果智能眼镜研发进展展开分析,指出苹果计划在未来12至16个月内推出首款智能眼镜产品, 且该产品将采用无显示屏设计,定位与Meta的Ray-Bans(雷朋)智能眼镜展开竞争。 据介绍,这款无屏智能眼镜将配备摄像头与具备播放、录音功能的音频系统,不过需连接iPhone才能实 现数据处理。而消费者期待的、能通过镜片显示内容的完整智能眼镜体验,目前仍需数年时间才能落 地。古尔曼解释,设备微型化与减重技术是当前核心瓶颈,苹果的目标是让智能眼镜尽可能接近普通轻 量眼镜形态,避免重蹈Apple Vision Pro因体积和重量带来的使用体验局限。 在市场竞争层面,古尔曼特别强调苹果在智能眼镜领域拥有天然优势。一方面,依托苹果强 ...