跃问(现阶跃AI)

Search documents
多模态=AGI入场券?阶跃星辰姜大昕:死磕基座大模型,探索多模态理解生成一体化
量子位· 2025-05-10 04:40
Core Viewpoint - The company, Jieyue Xingchen, is committed to the research and development of foundational large models, despite many competitors shifting focus away from this area. The CEO, Jiang Daxin, emphasizes the importance of continuing to invest in foundational models to keep pace with industry trends and technological advancements [1][2]. Group 1: Commitment to Foundational Models - Jiang Daxin explains that the company does not want to abandon mainstream growth trends and will continue to focus on foundational model research [2]. - The relationship between applications and models is seen as complementary, where models set the upper limits for applications, and applications provide specific scenarios and data for models [3]. Group 2: Product Evolution - Over the past year, the company's products have undergone significant changes, including a rebranding of its C-end assistant app from "Yuewen" to "Jieyue AI," reflecting a shift from a ChatGPT-like product to an agent platform [4]. - The company has released 22 foundational models in two years, with 16 being multimodal models, indicating a strong focus on diverse applications across text, voice, image, video, and music [10][11]. Group 3: Trends in Large Models - Jiang Daxin identifies two significant trends in the large model field: the transition from imitation learning to reinforcement learning, and the evolution from multimodal fusion to integrated multimodal understanding and generation [7][9]. - The company aims to achieve integrated multimodal understanding and generation, which means using a single model for both understanding and generating content across different modalities [12][13]. Group 4: Technical Challenges and Future Directions - The complexity of visual content generation requires a better understanding of context, as visual modalities are higher-dimensional and continuous compared to language modalities [14]. - The company is working on developing a scalable architecture for visual understanding and generation, with initial successes in models like Step1X-Edit [16][17]. - Jiang Daxin expresses confidence in the company's ability to explore multiple technical paths simultaneously, as achieving integrated understanding and generation requires strong capabilities across various modalities [21][22].