Workflow
生成式模型
icon
Search documents
放榜了!ICCV 2025最新汇总(自驾/具身/3D视觉/LLM/CV等)
自动驾驶之心· 2025-06-28 13:34
Core Insights - The article discusses the recent ICCV conference, highlighting the excitement around the release of various works related to autonomous driving and the advancements in the field [2]. Group 1: Autonomous Driving Innovations - DriveArena is introduced as a controllable generative simulation platform aimed at enhancing autonomous driving capabilities [4]. - Epona presents an autoregressive diffusion world model specifically designed for autonomous driving applications [4]. - SynthDrive offers a scalable Real2Sim2Real sensor simulation pipeline for high-fidelity asset generation and driving data synthesis [4]. - StableDepth focuses on scene-consistent and scale-invariant monocular depth estimation, which is crucial for improving perception in autonomous vehicles [4]. - CoopTrack explores end-to-end learning for efficient cooperative sequential perception, enhancing the collaborative capabilities of autonomous systems [4]. Group 2: Image and Vision Technologies - CycleVAR repurposes autoregressive models for unsupervised one-step image translation, which can be beneficial for visual recognition tasks in autonomous driving [5]. - CoST emphasizes efficient collaborative perception from a unified spatiotemporal perspective, which is essential for real-time decision-making in autonomous vehicles [5]. - Hi3DGen generates high-fidelity 3D geometry from images via normal bridging, improving the spatial understanding of environments for autonomous systems [5]. - GS-Occ3D focuses on scaling vision-only occupancy reconstruction for autonomous driving using Gaussian splatting techniques [5]. Group 3: Large Model Applications - ETA introduces a dual approach to self-driving with large models, enhancing the efficiency and effectiveness of autonomous driving systems [5]. - Taming the Untamed discusses graph-based knowledge retrieval and reasoning for multi-layered large models (MLLMs), which can significantly improve the decision-making processes in autonomous driving [7].
ICCV 2025不完全汇总(具身/自驾/3D视觉/LLM/CV等)
具身智能之心· 2025-06-27 09:41
Group 1 - The article discusses the recent announcements from ICCV 2025, highlighting various works that have been accepted for presentation [1] - It emphasizes the importance of the "Embodied Intelligence" community in sharing insights and developments related to the accepted works [1] - The article encourages readers to join the community for timely updates on ongoing research and developments in the field [1] Group 2 - Several works related to embodied intelligence and autonomous driving are summarized, showcasing advancements in areas such as robotic manipulation and navigation [4][6] - The article lists various projects, including "GaussianProperty" and "DriveArena," which focus on integrating physical properties and generative simulation for autonomous driving [4] - It also mentions works on 3D reconstruction and visual recognition, indicating a broad range of research topics being explored [6][5]
苹果憋一年终超同参数 Qwen 2.5?三行代码即可接入 Apple Intelligence,自曝如何做推理
AI前线· 2025-06-10 10:05
Core Insights - Apple has introduced a new generation of language foundation models designed to enhance Apple Intelligence capabilities, featuring a compact model with approximately 3 billion parameters and a server-based mixed expert model tailored for private cloud architecture [1][4][6]. Model Overview - The new foundation models framework allows third-party developers to access Apple Intelligence's core large language models and integrate them into their applications with minimal coding [4][20]. - The device-side model is optimized for efficiency and low latency on Apple chips, while the server-side model supports high precision and scalability for more complex tasks [6][7]. Performance Evaluation - Apple’s device-side model outperforms slightly larger models like Qwen-2.5-3B across all language environments and competes with larger models like Qwen-3-4B in English [8][10]. - The server-side model shows superior performance compared to Llama-4-Scout but lags behind larger models such as Qwen-3-235B and proprietary GPT-4o [8][10]. Architectural Innovations - The device-side model reduces key-value cache memory usage by 38.5% and improves time-to-first-token generation [7]. - The server-side model employs a parallel track expert mixed (PT-MoE) design, enhancing efficiency and scalability without compromising quality [7][8]. Training Improvements - Apple has revamped its training scheme to enhance reasoning capabilities, utilizing a multi-stage pre-training process that significantly reduces training costs [14][16]. - The integration of visual understanding into the models has been achieved without degrading text capabilities, enhancing overall performance [16]. Compression Techniques - Apple employs quantization techniques to reduce the model size and power consumption, achieving a compression of device-side model weights to 2 bits per weight and server-side model weights to 3.56 bits per weight [17][18]. - The models maintain quality through additional training data and low-rank adapters, with minor regressions observed in performance metrics [17]. Developer Accessibility - The foundation models framework is designed to be user-friendly, allowing developers to integrate AI capabilities into their applications with just three lines of code [20][21]. - The framework supports Swift language natively and includes features for guided generation and tool invocation, simplifying the integration process [20][21]. Current Status - The foundation models framework is currently in testing through the Apple Developer Program, with a public beta expected to be available soon [22].
一个md文件收获超400 star,这份综述分四大范式全面解析了3D场景生成
机器之心· 2025-06-10 08:41
Core Insights - The article discusses the advancements in 3D scene generation, highlighting a comprehensive survey that categorizes existing methods into four main paradigms: procedural methods, neural network-based 3D representation generation, image-driven generation, and video-driven generation [2][4][7]. Summary by Sections Overview of 3D Scene Generation - A survey titled "3D Scene Generation: A Survey" reviews over 300 representative papers and outlines the rapid growth in the field since 2021, driven by the rise of generative models and new 3D representations [2][4][5]. Four Main Paradigms - The four paradigms provide a clear technical roadmap for 3D scene generation, with performance metrics compared across dimensions such as realism, diversity, viewpoint consistency, semantic consistency, efficiency, controllability, and physical realism [7]. Procedural Generation - Procedural generation methods automatically construct complex 3D environments using predefined rules and constraints, widely applied in gaming and graphics engines. This category can be further divided into neural network-based generation, rule-based generation, constraint optimization, and large language model-assisted generation [8]. Image-based and Video-based Generation - Image-based generation leverages 2D image models to reconstruct 3D structures, while video-based generation treats 3D scenes as sequences of images, integrating spatial modeling with temporal consistency [9]. Challenges in 3D Scene Generation - Despite significant progress, challenges remain in achieving controllable, high-fidelity, and physically realistic 3D modeling. Key issues include uneven generation capabilities, the need for improved 3D representations, high-quality data limitations, and a lack of unified evaluation standards [10][16]. Future Directions - Future advancements should focus on higher fidelity generation, parameter control, holistic scene generation, and integrating physical constraints to ensure structural and semantic consistency. Additionally, supporting interactive scene generation and unifying perception and generation capabilities are crucial for the next generation of 3D modeling systems [12][18].
真有人会爱上ChatGPT?我尝试和AI“交往”一周后发现有些不对劲
Hu Xiu· 2025-05-11 07:02
Group 1 - The article discusses the growing phenomenon of human-AI relationships, highlighting cases where individuals have developed emotional connections with AI, leading to significant life decisions such as divorce and marriage to AI [2][35][41] - It mentions that some users have become so immersed in their interactions with AI that they perceive it as a friend or partner, which raises concerns about the implications for real-life relationships and mental health [6][41][49] - The article emphasizes the need for users to be aware of the potential for dependency on AI, especially for those with underlying psychological issues, and suggests that AI should not replace human interaction [42][57] Group 2 - The text outlines various strategies for users to enhance their interactions with AI, such as customizing prompts and understanding the AI's response patterns to create a more engaging experience [9][31][44] - It highlights the importance of treating AI as a conversational partner rather than just a tool, which can lead to deeper self-reflection and personal insights for users [32][41] - The article also points out the limitations of AI, noting that while it can provide immediate feedback and companionship, it lacks true emotional understanding and memory retention, which can lead to disillusionment [55][56]