Core Insights - SenseTime Group Limited has officially released and open-sourced a new multimodal model architecture called NEO, developed in collaboration with Nanyang Technological University's S-Lab, which serves as the foundation for the SenseNova multimodal model [2][3] - NEO is the industry's first native multimodal architecture that breaks away from traditional modular paradigms, achieving deep integration of multimodal capabilities and redefining the performance boundaries of multimodal models [2][3] Summary by Sections - NEO Architecture: NEO is designed to address the limitations of existing modular multimodal models, which typically follow a "visual encoder + projector + language model" structure. This traditional approach, while compatible with image inputs, remains language-centric and limits the model's efficiency and capability in complex multimodal scenarios [2] - Technological Advancements: SenseTime has made significant strides in multimodal native integration training technology, winning top positions in SuperCLUE language evaluation and OpenCompass multimodal evaluation with a single model. The company has also enhanced the SenseNova model's multimodal reasoning capabilities, achieving a threefold increase in cost-effectiveness with the release of SenseNova 6.5 [3] - Open Source Initiative: SenseTime has officially open-sourced two specifications of models based on the NEO architecture, aiming to foster innovation and application within the open-source community. The company is committed to driving the next-generation AI infrastructure through collaborative open-source efforts and practical applications [3]
商汤发布NEO架构 重新定义多模态模型效能边界