Workflow
空间大模型
icon
Search documents
群核科技发布空间大模型,旨在解决AI视频空间一致性难题
3 6 Ke· 2025-08-29 04:00
Core Insights - The company, Qunke Technology, launched its latest spatial models, SpatialLM 1.5 and SpatialGen, during the first Tech Day on August 25, emphasizing an open-source strategy to engage global developers [1][4] - SpatialLM 1.5 is designed to understand and generate spatial language, enabling the creation of structured 3D scene scripts based on user text inputs, showcasing its potential in robotics [1][2] - SpatialGen focuses on generating multi-view images with temporal consistency, addressing current challenges in AI-generated video content [2][3] Group 1: Model Features - SpatialLM 1.5 utilizes a large language model to learn a new "spatial language," allowing it to describe spatial structures and relationships in 3D scenes accurately [1] - The model can generate structured 3D scene scripts and assist in robot path planning and task execution, addressing the scarcity of interactive 3D data [2] - SpatialGen employs a diffusion model architecture to create multi-view images based on text and 3D layouts, maintaining spatial logic and consistency [2][3] Group 2: Strategic Vision - Qunke Technology's strategy revolves around a "space editing tool - space synthesis data - space large model" framework, creating a positive feedback loop to enhance model training and tool experience [3] - The company has accumulated over 441 million 3D models and 500 million structured 3D spatial scenes as of June 30, 2025, leveraging these assets for model development [3] - The open-source initiative, started in 2018, aims to collaborate with global developers to advance spatial model technology [3][4]
群核科技开源两款空间大模型,想解决 Genie3 没能彻底解决的问题
Founder Park· 2025-08-27 11:41
Core Viewpoint - The article discusses the emergence of "world models" in AI, highlighting the release of Genie 3 by Google DeepMind and the advancements in 3D spatial models by Qunke Technology, which aim to address the challenges of spatial consistency in AI-generated environments [2][8]. Group 1: Types of World Models - There are two main types of world models: video models like Sora and Genie 3, which simulate the physical world using 2D image sequences, and large-scale 3D models that focus on reconstructing 3D scenes [4][5]. - Video models struggle with maintaining spatial consistency due to their reliance on 2D images, while 3D models face challenges in creating comprehensive spatial content from multiple angles [6][8]. Group 2: Qunke Technology's Innovations - Qunke Technology introduced the first 3D indoor scene cognition and generation model, SpatialGen, which addresses spatial consistency issues by generating a navigable 3D space that supports any viewpoint switching [8][10]. - SpatialLM 1.5, a spatial language model, allows users to generate interactive 3D scenes through natural language commands, significantly enhancing usability for non-experts [10][11]. Group 3: Technical Foundations - SpatialGen utilizes a multi-view diffusion and 3D Gaussian reconstruction technology to ensure that lighting and texture remain consistent across different viewpoints [14][15]. - The models are built on a foundation of extensive 3D spatial data, with Qunke's tools generating structured 3D data that includes physical parameters and spatial relationships [16][18]. Group 4: Market Opportunities and Challenges - The current state of spatial models is likened to early versions of GPT, indicating that while they have foundational capabilities, they are not yet universally applicable [20]. - The demand for AI-generated short films presents a significant opportunity, as these models can improve scene coherence and production efficiency, addressing common issues in traditional AI tools [21][22]. Group 5: Future Directions - Qunke Technology is developing an AI video generation product that integrates 3D capabilities to further enhance spatial consistency in generated content [24]. - The company aims to bridge the gap between virtual and real-world applications, particularly in robotics, by providing structured 3D data that can be used for training [41].
空间智能卡脖子难题被杭州攻克!难倒GPT-5后,六小龙企业出手了
量子位· 2025-08-27 05:49
Core Viewpoint - The article discusses the emergence of 3D content generation models, highlighting the unique approach of Qunhe Technology in developing a spatial large model that addresses the core industry pain point of "spatial consistency" [2][7]. Group 1: Current Landscape of 3D Content Generation - Major players in the 3D content generation space include Google Genie 3 and World Labs, focusing on either video generation or 3D scene generation [5]. - The "video generation faction," represented by Genie 3, can create dynamic interactive content but struggles with maintaining three-dimensional spatial consistency [5]. - The "3D scene generation faction," represented by World Labs and others, can achieve 360-degree roaming but often faces issues with scene collapse and content inconsistencies due to a lack of high-quality 3D data [5][11]. Group 2: Qunhe Technology's Spatial Large Model - Qunhe Technology's spatial large model aims to overcome the challenges faced by existing models, particularly in terms of spatial consistency and realistic roaming capabilities [8][12]. - The model is characterized by three features: realistic holographic roaming scenes, interactivity, and complex spatial processing capabilities [13]. - Qunhe has released two sub-models: SpatialLM 1.5 (spatial language model) and SpatialGen (spatial generation model), which exemplify these features [14]. Group 3: Spatial Language and Interaction - Spatial language, as defined by Qunhe, allows the model to describe 3D scenes in terms of spatial parameters, enhancing its ability to support precise spatial generation and editing [21]. - The model can assist robots in understanding complex spatial tasks by incorporating physical parameters and spatial knowledge [19][21]. - Compared to traditional models, SpatialLM 1.5 demonstrates superior performance in spatial understanding and task execution [30][32]. Group 4: Challenges and Industry Context - The spatial intelligence field is still in its early stages, akin to the GPT-2 phase, facing challenges such as data scarcity, high acquisition costs, and complex scene semantic understanding [32][51]. - Qunhe Technology's strategy involves a "three-in-one" approach, integrating spatial editing tools, spatial synthetic data, and spatial large models to create a positive feedback loop for development [42][45]. - The company has built the largest indoor space deep learning dataset, InteriorNet, with over 441 million 3D models and 500 million structured 3D space scenes, enhancing its competitive edge in the spatial intelligence domain [45]. Group 5: Future Prospects - The article emphasizes the potential for rapid growth in the spatial intelligence sector, driven by collaborative efforts and open-source initiatives [52]. - Qunhe Technology aims to accelerate the evolution of spatial intelligence and expand the industry by fostering a community of developers and researchers [54].
群核科技黄晓煌:积极拥抱开源,推动属于空间大模型的「DeepSeek时刻」来临
IPO早知道· 2025-08-25 13:10
Core Viewpoint - Qunhe Technology aims to accelerate global spatial intelligence technology through open-source initiatives, showcasing its latest spatial models, SpatialLM 1.5 and SpatialGen, at its first Tech Day event [3][4]. Group 1: Spatial Models - Qunhe Technology has introduced SpatialLM 1.5, a spatial language model that allows users to generate structured scene scripts and layouts through natural language interactions, addressing limitations of traditional language models in understanding spatial relationships [4][6]. - SpatialGen, a multi-view image generation model, focuses on generating images with temporal and spatial consistency based on text descriptions and 3D layouts, enabling immersive experiences in generated 3D environments [7][8]. Group 2: Open Source Strategy - The company has been implementing an open-source strategy since 2018, gradually releasing its data and algorithm capabilities to foster innovation in spatial intelligence technology [4][10]. - Qunhe Technology's spatial intelligence ecosystem consists of a "space editing tool - spatial synthesis data - spatial large model" framework, which enhances data accumulation and model training through widespread tool application [4]. Group 3: Data and Model Performance - As of June 30, 2025, Qunhe Technology possesses over 441 million 3D models and more than 500 million structured 3D spatial scenes, which significantly contribute to the training and performance of its spatial models [4]. - The previous version, SpatialLM 1.0, quickly gained popularity on the Hugging Face trends list after its open-source release, demonstrating the effectiveness of the open-source model [6]. Group 4: AI Video Generation - The company is developing an AI video generation product that integrates 3D capabilities, aiming to address the challenges of temporal consistency in current AI-generated videos [10]. - Existing AI video creation often suffers from issues like object displacement and spatial logic confusion due to a lack of understanding of 3D structures, which Qunhe Technology seeks to overcome with its new model [10].