Workflow
空间语言
icon
Search documents
视频生成告别“瞬移变形”,群核科技Hugging Face登顶背后:空间语言改写AI物理世界规则
Tai Mei Ti A P P· 2025-09-01 03:18
AIGC技术正从文本、图像生成向更复杂的3D空间与视频领域延伸,但现有模型普遍面临两大核心挑 战:一是对物理世界空间结构的理解不足,导致3D场景生成缺乏逻辑性;二是视频创作中因视角切换 引发的时空一致性问题。 "何时人工智能从数字世界走向物理世界呢?我们认为空间智能就是这里面非常关键的桥梁。"群核科技 联合创始人兼董事长黄晓煌表示。 而对于空间智能来说,它的核心是让AI真正理解物理世界的"语言":要让AI学会用"空间语言"描述世 界,这是它走进物理世界的第一步。 此次发布的两款模型分别针对文章开头所提的两大痛点——SpatialLM 1.5通过"空间语言"实现3D场景 的结构化生成与交互,SpatialGen则依托3D高斯技术保障多视角图像的空间连贯性。 前者生成的场景富含物理正确的结构化信息,支持用户通过对话交互系统SpatialLM-Chat进行可交互场 景的端到端生成,能够有效解决机器人训练数据难题;后者,专注于"生成与呈现", 可根据文字描 述、参考图像和3D空间布局,生成具有时空一致性的多视角图像。 据介绍,传统多模态模型(如GPT-4V、通义千问VLM)通过将图像切割为视觉Token与文本对齐,实 ...
空间智能卡脖子难题被杭州攻克!难倒GPT-5后,六小龙企业出手了
量子位· 2025-08-27 05:49
Core Viewpoint - The article discusses the emergence of 3D content generation models, highlighting the unique approach of Qunhe Technology in developing a spatial large model that addresses the core industry pain point of "spatial consistency" [2][7]. Group 1: Current Landscape of 3D Content Generation - Major players in the 3D content generation space include Google Genie 3 and World Labs, focusing on either video generation or 3D scene generation [5]. - The "video generation faction," represented by Genie 3, can create dynamic interactive content but struggles with maintaining three-dimensional spatial consistency [5]. - The "3D scene generation faction," represented by World Labs and others, can achieve 360-degree roaming but often faces issues with scene collapse and content inconsistencies due to a lack of high-quality 3D data [5][11]. Group 2: Qunhe Technology's Spatial Large Model - Qunhe Technology's spatial large model aims to overcome the challenges faced by existing models, particularly in terms of spatial consistency and realistic roaming capabilities [8][12]. - The model is characterized by three features: realistic holographic roaming scenes, interactivity, and complex spatial processing capabilities [13]. - Qunhe has released two sub-models: SpatialLM 1.5 (spatial language model) and SpatialGen (spatial generation model), which exemplify these features [14]. Group 3: Spatial Language and Interaction - Spatial language, as defined by Qunhe, allows the model to describe 3D scenes in terms of spatial parameters, enhancing its ability to support precise spatial generation and editing [21]. - The model can assist robots in understanding complex spatial tasks by incorporating physical parameters and spatial knowledge [19][21]. - Compared to traditional models, SpatialLM 1.5 demonstrates superior performance in spatial understanding and task execution [30][32]. Group 4: Challenges and Industry Context - The spatial intelligence field is still in its early stages, akin to the GPT-2 phase, facing challenges such as data scarcity, high acquisition costs, and complex scene semantic understanding [32][51]. - Qunhe Technology's strategy involves a "three-in-one" approach, integrating spatial editing tools, spatial synthetic data, and spatial large models to create a positive feedback loop for development [42][45]. - The company has built the largest indoor space deep learning dataset, InteriorNet, with over 441 million 3D models and 500 million structured 3D space scenes, enhancing its competitive edge in the spatial intelligence domain [45]. Group 5: Future Prospects - The article emphasizes the potential for rapid growth in the spatial intelligence sector, driven by collaborative efforts and open-source initiatives [52]. - Qunhe Technology aims to accelerate the evolution of spatial intelligence and expand the industry by fostering a community of developers and researchers [54].