计算机视觉
Search documents
汇报一下ICCV全部奖项,恭喜朱俊彦团队获最佳论文
具身智能之心· 2025-10-26 04:02
Core Insights - The article highlights the significant presence of Chinese authors at ICCV 2025, accounting for 50% of the submissions, showcasing China's growing influence in the field of computer vision [1]. Awards and Recognitions - The Best Paper Award (Marr Prize) was awarded to a study titled "Generating Physically Stable and Buildable Brick Structures from Text," which introduced BRICKGPT, a model that generates stable brick structures based on textual prompts [4][24]. - The Best Student Paper Award went to "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models," which presents a method for editing images without the need for inversion [6][38]. - Honorary mentions for Best Paper included "Spatially-Varying Autofocus," which innovatively allows cameras to focus on different depths simultaneously [7][42]. - Honorary mentions for Best Student Paper included "RayZer: A Self-supervised Large View Synthesis Model," which autonomously reconstructs camera parameters and generates new perspectives from uncalibrated images [9][47]. Notable Research Contributions - The BRICKGPT model was trained on a dataset of over 47,000 brick structures, demonstrating its ability to generate aesthetically pleasing and stable designs that can be assembled manually or by robotic arms [24][26]. - FlowEdit utilizes a differential equation to map source and target distributions directly, achieving advanced results without the need for model-specific dependencies [39][40]. - The "Fast R-CNN" method, awarded the Helmholtz Prize, significantly improved training and testing speeds while enhancing detection accuracy in object recognition tasks [10][54]. - The research on modified activation functions, which led to a new parameterized ReLU, achieved a top-5 test error of 4.94% on the ImageNet dataset, surpassing human-level performance [58][60]. Awarded Teams and Individuals - The SMPL Body Model Team developed a highly accurate 3D human model based on extensive data from 3D scans, enhancing compatibility with mainstream rendering pipelines [62][66]. - The VQA Team created a dataset for visual question answering, containing approximately 250,000 images and 7.6 million questions, facilitating deeper understanding and reasoning about image content [68][69]. - Distinguished researchers David Forsyth and Michal Irani received the Outstanding Researcher Award for their contributions to computer vision and machine learning [72][75]. - Rama Chellappa was honored with the Azriel Rosenfeld Lifetime Achievement Award for his extensive work in computer vision and pattern recognition [78].
刚刚,ICCV最佳论文出炉,朱俊彦团队用砖块积木摘得桂冠
具身智能之心· 2025-10-23 00:03
Core Insights - The article discusses the recent International Conference on Computer Vision (ICCV) held in Hawaii, highlighting the award-winning research papers and their contributions to the field of computer vision [2][5][24]. Group 1: Award Winners - The Best Paper Award was given to a research team from Carnegie Mellon University (CMU) for their paper titled "Generating Physically Stable and Buildable Brick Structures from Text," led by notable AI scholar Zhu Junyan [3][7][11]. - The Best Student Paper Award was awarded to a paper from the Technion, titled "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models," which introduces a novel image editing method [28][30]. Group 2: Conference Statistics - ICCV is one of the top three conferences in computer vision, held biennially, with this year's conference receiving 11,239 valid submissions and accepting 2,699 papers, resulting in a 24% acceptance rate, a significant increase from the previous conference [5]. Group 3: Research Contributions - The paper by CMU presents Brick GPT, the first method capable of generating physically stable and interconnected brick assembly models based on text prompts. The research includes a large dataset of over 47,000 brick structures and 28,000 unique 3D objects with detailed descriptions [11][13]. - The FlowEdit paper from Technion proposes a new image editing approach that bypasses the traditional image-to-noise inversion process, achieving higher fidelity edits by establishing a direct mapping path between source and target image distributions [32][34]. Group 4: Methodology and Results - The Brick GPT method utilizes a self-regressive large language model trained on a dataset of brick structures, incorporating validity checks and a physics-aware rollback mechanism to ensure stability in generated designs [13][19]. - Experimental results show that Brick GPT outperforms baseline models in terms of validity and stability, achieving a 100% validity rate and 98.8% stability in generated structures [20][22].
汇报一下ICCV全部奖项,恭喜朱俊彦团队获最佳论文
量子位· 2025-10-22 05:48
Core Points - The ICCV 2025 conference in Hawaii highlighted significant contributions from Chinese researchers, who accounted for 50% of the paper submissions [1] - Various prestigious awards were announced, showcasing advancements in computer vision research [3] Award Highlights - Best Paper Award (Marr Prize): "Generating Physically Stable and Buildable Brick Structures from Text" introduced BRICKGPT, a model that generates stable brick structures based on text prompts, utilizing a dataset of over 47,000 structures [4][24][26] - Best Student Paper Award: "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models" proposed a method for image editing without inversion, achieving state-of-the-art results [6][39][40] - Best Paper Honorary Mention: "Spatially-Varying Autofocus" developed a technique for dynamic depth adjustment in imaging, enhancing focus clarity across scenes [7][42][44] - Best Student Paper Honorary Mention: "RayZer: A Self-supervised Large View Synthesis Model" demonstrated 3D perception capabilities using uncalibrated images [9][47][49] Special Awards - Helmholtz Prize: Awarded to "Fast R-CNN" for its efficient object detection capabilities, significantly improving training and testing speeds [10][52][54] - Another Helmholtz Prize was given for research on rectified activation functions, achieving performance surpassing human-level accuracy on ImageNet [10][59][60] - Evelyn Erham Award: Recognized teams for their contributions to 3D modeling and visual question answering [12][63][68] - Distinguished Researcher Award: David Forsyth and Michal Irani were honored for their impactful work in computer vision [14][73][76] - Azriel Rosenfeld Lifetime Achievement Award: Rama Chellappa was recognized for his extensive contributions to the field [16][79] Research Contributions - The BRICKGPT model was developed to generate physically stable structures, utilizing a large dataset and innovative mechanisms for stability [24][26] - FlowEdit's approach allows for seamless image editing across different model architectures, enhancing flexibility in applications [39][40] - The spatially-varying autofocus technique improves image clarity by dynamically adjusting focus based on scene depth [42][44] - RayZer's self-supervised learning approach enables 3D scene reconstruction without the need for calibrated camera data [47][49] Conclusion - The ICCV 2025 conference showcased groundbreaking research and innovations in computer vision, with significant contributions from various teams and individuals, particularly highlighting the achievements of Chinese researchers [1][3]
CVPR 2026新规:强制披露算力成本,高效率、高透明度论文可获三项认可奖
机器之心· 2025-10-22 03:30
Core Viewpoint - The CVPR has introduced a pilot program called the "Compute Resource Report Form (CRF)" to enhance transparency and fairness in AI research by requiring authors to report the computational resources used in their studies starting from CVPR2026 [2][4]. Group 1: CRF Implementation - All paper submitters must provide a detailed report of the computational resources used, including GPU and CPU usage, training time, and model efficiency [7]. - Submission of the CRF is mandatory but will not affect the paper acceptance decision, as the data will be reviewed by an independent committee [6][16]. - The CRF requires basic information about hardware, computation time, and performance results compared to the strongest benchmarks [7][31]. Group 2: Recognition Awards - Papers demonstrating outstanding computational efficiency and/or transparency may qualify for recognition awards, including the Efficient CVPR Badge, CVPR Compute Star Award, and CVPR Compute Transparency Champion Award [9][10]. - Awards will be evaluated based on objective metrics and will ensure fair assessment across similar task categories [10][27]. Group 3: Submission Process - Authors are encouraged to review a pre-filled example form to understand how to complete each section of the CRF [11][19]. - The completion of the mandatory sections typically takes 10-15 minutes, while optional sections may require additional time [28]. - Authors should save the original PDF of the filled form without flattening it to retain the necessary data fields for processing [12][20]. Group 4: FAQs and Clarifications - The CRF aims to provide insights into the actual computational costs of different methods and is a transparency experiment rather than a judgment mechanism [15][18]. - High computational resource usage will not incur penalties, as significant advancements often require substantial resources [17]. - The submission of anonymized Weights & Biases logs is optional but can enhance the chances of receiving recognition awards [26].
刚刚,ICCV最佳论文出炉,朱俊彦团队用砖块积木摘得桂冠
机器之心· 2025-10-22 03:30
Core Insights - The ICCV (International Conference on Computer Vision) awarded the best paper and best student paper on October 22, 2023, highlighting significant advancements in computer vision research [1][2][4]. Group 1: Best Paper - The best paper award was given to a research team from Carnegie Mellon University (CMU) for their paper titled "Generating Physically Stable and Buildable Brick Structures from Text" led by notable AI scholar Junyan Zhu [6][9]. - The paper introduces BrickGPT, a novel method that generates physically stable and interconnected brick assembly models based on text prompts, marking a significant advancement in the field [9][11]. - The research team created a large-scale dataset of stable brick structures, comprising over 47,000 models and 28,000 unique 3D objects with detailed text descriptions, to train their model [11][10]. Group 2: Methodology and Results - The methodology involves discretizing a brick structure into a sequence of text tokens and training a large language model to predict the next brick to add, ensuring physical stability through validity checks and a rollback mechanism [10][17]. - Experimental results indicate that BrickGPT achieved a 100% validity rate and 98.8% stability rate, outperforming various baseline models in both effectiveness and stability [20][18]. - The paper's approach allows for the generation of diverse and aesthetically pleasing brick structures that align closely with the input text prompts, demonstrating high fidelity in design [11][20]. Group 3: Best Student Paper - The best student paper award went to a research from the Technion titled "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models," which innovatively bypasses traditional image editing paths to enhance image fidelity [25][28]. - FlowEdit establishes a direct mapping path between source and target image distributions, resulting in lower transfer costs and better preservation of original image structure during editing [31][27]. - The method was validated on advanced T2I flow models, achieving state-of-the-art results across various complex editing tasks, showcasing its efficiency and superiority [31][31]. Group 4: Other Awards and Recognitions - The Helmholtz Prize was awarded for contributions to computer vision benchmarks, recognizing two significant papers, including "Fast R-CNN" by Ross Girshick, which improved detection speed and accuracy [36][38]. - The Everingham Prize recognized teams for their contributions to 3D modeling and multimodal AI, including the development of the SMPL model and the VQA dataset [41][43]. - Significant Researcher Awards were given to David Forsyth and Michal Irani for their impactful contributions to the field of computer vision [50][52].
可实时预警岩体微小变化!深大团队研发地质灾害防治系统
Nan Fang Du Shi Bao· 2025-10-21 07:57
Core Viewpoint - The research team led by Professor Huang Hui from Shenzhen University has developed a new generation of intelligent monitoring system for geological disasters, which integrates computer vision, deep learning, and cloud-edge-end collaborative technology, transforming traditional point-based monitoring into comprehensive and intelligent monitoring [1][3]. Group 1: Traditional Monitoring Limitations - Traditional geological disaster monitoring methods rely heavily on embedded sensors and manual inspections, which have significant limitations [3]. - Sensors can only monitor preset points and cannot cover entire risk areas, while manual inspections are constrained by weather and terrain, making many dangerous areas inaccessible [3]. Group 2: Technological Innovations - The team proposed a core graphic information "cloud-edge-end" collaborative processing technology, achieving a transition from point monitoring to comprehensive prevention [3]. - The system utilizes a combination of computer graphics, computer vision, and deep learning, with breakthroughs in three key technical areas: effective capture of abnormal movements in monitored areas, over 85% accuracy in identifying rockfall events, and high-precision measurement of target displacement [3]. Group 3: Application and Impact - The system has demonstrated its application value in various scenarios, including 24-hour monitoring of tunnel entrances and high slope sections on mountain roads, rockfall disaster warnings for railways, stability monitoring in open-pit mining, and ensuring the safety of slopes in water conservancy projects [5]. - It has been implemented in Shenzhen's Jiangangshan Park, providing continuous monitoring and alarm for dangerous rocks and rockfalls [5]. - The monitoring device is equipped with a large-capacity solar power system for uninterrupted operation, showcasing strong environmental adaptability and energy self-sufficiency [5]. - The system captures minute changes in rock formations using high-resolution cameras and analyzes data in real-time with built-in intelligent algorithms, triggering multi-level alerts and uploading data to a cloud management platform via 4G/5G networks [5]. - This technology marks a shift from passive waiting to proactive prediction in geological disaster monitoring and early warning, entering a new phase of "full-domain perception, intelligent deduction, and precise warning" [5].
苹果盯上Prompt AI,不是买产品,是要伯克利团队的“视觉大脑”
3 6 Ke· 2025-10-14 00:59
Core Insights - Apple is in the final stages of negotiations to acquire Prompt AI, a startup in the computer vision field, marking a significant move in its AI strategy since the $3 billion acquisition of Beats Electronics in 2014 [1][3] - The acquisition focuses on Prompt AI's core technology and team, aligning with the current trend in Silicon Valley of "acquihire" [3] Acquisition Details - Prompt AI has a small team of 11 employees and was founded in 2023, having raised $5 million in seed funding [4][5] - The company's flagship product, Seemour, is a key focus for Apple, although its commercialization has faced challenges, leading to plans to take the app offline [7][5] - Employees not joining Apple will receive compensation and can apply for other positions within the company, indicating a cautious approach to the acquisition [7] Technology Fit - Seemour's capabilities align well with Apple's HomeKit ecosystem, addressing its shortcomings in home security [8] - The app can accurately identify people and objects, generate scene descriptions, and protect user privacy by processing data locally [8] - The underlying technology of Seemour could support other Apple initiatives, such as AI smart glasses and autonomous driving [8] Acquisition Strategy - Apple's acquisition strategy has historically favored small, focused teams rather than large-scale mergers, allowing for quick integration of technology and talent [9][10] - The company has made several smaller acquisitions to enhance its core products, contrasting with competitors like Meta and Google, which pursue larger, more integrated deals [10][14] Industry Context - The acquisition reflects a broader trend in Silicon Valley where tech giants are increasingly opting for talent acquisition to bolster their AI capabilities [11][14] - Apple's approach emphasizes precise adaptation to its existing ecosystem rather than broad coverage, focusing on specific needs in areas like HomeKit, AR, and autonomous driving [14][15]
ImageNet作者苏昊被曝任教复旦
量子位· 2025-10-10 03:52
Core Viewpoint - The article discusses the potential appointment of Hao Su, a prominent figure in embodied intelligence and computer vision, to Fudan University, highlighting his significant contributions to the field and his entrepreneurial ventures in robotics and simulation [1][49][51]. Group 1: Hao Su's Academic and Research Background - Hao Su is an associate professor at the University of California, San Diego (UCSD), specializing in computer vision, graphics, embodied intelligence, and robotics [14][49]. - He was involved in the creation of ImageNet and has led foundational projects such as ShapeNet, PointNet, and SAPIEN, which have significantly advanced the fields of 2D and 3D vision [4][30][34]. - Su's research has evolved from natural language processing to computer vision and then to 3D vision, culminating in the development of large-scale datasets and models that have transformed the landscape of artificial intelligence [22][30][34]. Group 2: Contributions to Robotics and Simulation - In 2020, Su launched SAPIEN, the first simulator focused on generalizable robotic operations, and later developed the ManiSkill platform for training robotic skills [35][41]. - His company, Hillbot, co-founded in 2024, aims to leverage high-fidelity simulation for robotics, with products like Hillbot Alpha designed for complex environments [43][45]. - Hillbot has partnered with Nvidia to generate high-quality training data, indicating a strong focus on enhancing robotic capabilities through advanced simulation techniques [47]. Group 3: Potential Move to Fudan University - There are rumors that Su will join Fudan University, which may invest in his company Hillbot and potentially appoint him to dual roles at various research institutes [51][52]. - Fudan University has established a credible embodied intelligence research institute, offering competitive salaries and performance-based incentives, which could attract top talent like Su [55][57].
算法小垃圾跳槽日记 2024&2025版
自动驾驶之心· 2025-10-06 04:05
Core Insights - The article discusses the author's experience in job searching and interviews, highlighting the challenges and changes in the job market, particularly in the computer vision (CV) and deep learning sectors [4][6][8]. Job Search Experience - The author experienced a high volume of interviews, averaging six per day over a month, with some days reaching eight interviews, indicating a competitive job market [4][5]. - The author transitioned from a role in a delivery company focused on CV to seeking opportunities in more stable and specialized areas, reflecting a shift in personal career focus [6][8]. Market Trends - There has been a significant increase in job opportunities compared to previous years, with many large and mid-sized companies actively hiring [8]. - The demand for traditional CV roles has diminished, with a notable shift towards large models, multi-modal applications, and end-to-end models in the autonomous driving sector [8][10]. Interview Preparation - The author prepared for interviews by reviewing popular coding problems, particularly from LeetCode, indicating a trend where companies now require candidates to demonstrate coding skills more rigorously than in the past [9][10]. - The author noted that many interview questions were derived from the "Hot100" list of coding problems, emphasizing the importance of algorithmic knowledge in technical interviews [11]. Career Transition - After several interviews, the author received offers from companies like Kuaishou, Xiaomi, and Weibo, but faced challenges in securing positions at larger firms like Alibaba and Baidu [10]. - Ultimately, the author accepted a position at a foreign company, which was described as a significantly better work environment compared to previous domestic companies, highlighting the differences in corporate culture [10][12]. Technical Skills and Trends - The author observed a shift in technical skills required in the job market, with a growing emphasis on large models and multi-modal technologies, suggesting that professionals in the field need to adapt to these changes to remain competitive [13].
刚毕业的AI博士,滞销了
投资界· 2025-09-28 07:35
Core Viewpoint - The article highlights the stark contrast in the job market for AI PhDs, where top-tier talent is highly sought after and rewarded, while the majority of average AI PhDs struggle to find suitable employment opportunities, leading to a polarized job landscape [5][10]. Group 1: Job Market Dynamics - The job market for AI PhDs is characterized by a significant divide, with top-tier candidates receiving lucrative offers, while average candidates face rejection and limited opportunities [5][10]. - Companies are increasingly selective, with hiring ratios for desirable positions often exceeding 10:1, and in some cases, as high as 200:1 for certain roles [8][9]. - Many average AI PhDs find themselves in a "talent pool," waiting for opportunities that may never materialize, as they lack the necessary credentials or connections to secure positions [9][10]. Group 2: Recruitment Challenges - Average AI PhDs often struggle to meet the high expectations set by companies, which seek candidates with extensive publication records and relevant experience [12][21]. - The pressure to publish is immense, with some candidates feeling compelled to produce papers that may lack genuine innovation just to meet job application requirements [13][18]. - The recruitment process is lengthy and often results in disappointment, as candidates face long waiting periods only to receive rejection notices [7][8]. Group 3: The Role of Networking - Networking plays a crucial role in securing job opportunities, with many positions being filled through personal connections rather than solely based on qualifications [21][22]. - Companies often prefer candidates who come recommended by trusted sources, such as former professors or industry contacts, which can disadvantage those without such connections [21][22]. - The reliance on networking highlights the importance of building relationships within the industry, as many job openings are not publicly advertised [21][22]. Group 4: Industry Trends and Expectations - The AI industry is rapidly evolving, with a strong focus on commercial applications and the development of general-purpose models, which may not align with the specialized research backgrounds of many PhDs [18][20]. - Companies are increasingly looking for candidates who can contribute to immediate business needs rather than those with niche expertise, leading to a mismatch between academic training and industry requirements [19][20]. - The competitive landscape for AI talent is intensifying, with top companies offering attractive packages to lure the best candidates, further widening the gap for average PhDs [10][11].