多模态大模型
Search documents
生数科技完成数亿元A轮融资,下周将发布全新模型版本
Feng Huang Wang· 2025-09-19 06:42
Group 1 - The core point of the article is that Shengshu Technology, a multimodal startup, has completed a Series A financing round amounting to several hundred million RMB, led by Bohua Capital with participation from existing investors and new industry partners [1] - The new funding will be used for model research and technological innovation, aiming to explore the intelligent limits and application breadth of multimodal large models [1] - Shengshu Technology's CEO, Luo Yihang, emphasized the focus on product expansion, user service, industry collaboration, and global business layout as part of the financing strategy [1] Group 2 - Shengshu Technology has previously completed three rounds of financing, including angel, angel+, and Pre-A rounds, with notable investors such as Qiming Venture Partners, Ant Group, and Baidu's strategic investment [1] - The company recently released the Vidu Q1 reference image model, which competes directly with Google Nano Banana, supporting up to seven reference images simultaneously, achieving a breakthrough in multi-subject consistency and high fidelity [1] - An upcoming version of Vidu is set to be released next week, focusing on capabilities in the image-to-video domain [1]
生数科技完成数亿元A轮融资:刚发布正面对标Nano Banana的Vidu Q1参考生图
IPO早知道· 2025-09-19 02:37
Core Insights - The article discusses the recent A-round financing of Shengshu Technology, which raised several hundred million RMB to enhance model research and technological innovation in multi-modal large models [2][3] - Shengshu Technology's core product, Vidu, is designed for AI image, video, and audio generation, targeting various industries such as internet, advertising, e-commerce, and education [2][3] Financing and Investment - The A-round financing was led by Liangxi Digital Industry Fund managed by Bohua Capital, with participation from Baidu's strategic investment, Beijing AI Industry Investment Fund, and other existing shareholders [2] - The investment focus of Liangxi Digital Industry Fund is on the artificial intelligence sector, aligning with Shengshu Technology's ongoing development in the multi-modal field [3] Product Development and Market Impact - Vidu, launched globally in July 2024, has achieved an annual recurring revenue (ARR) of over $20 million within eight months, covering over 200 countries and regions [3] - The product has rapidly gained traction, reaching over 30 million users and 6,000 developers and enterprises globally [3] Competitive Landscape - Shengshu Technology's Vidu product is positioned against competitors like Google Nano Banana, showcasing its capabilities in AI video generation and image creation [3]
锦秋基金被投公司「生数科技」完成新一轮数亿元A轮融资 | Jinqiu Spotlight
锦秋集· 2025-09-19 02:17
Core Insights - Jinqiu Capital invested in Shengshu Technology as an early institutional investor in mid-2023 [1] - Shengshu Technology completed a new round of financing amounting to several hundred million RMB, led by Bohua Capital, with participation from various investors including Baidu and Qiming Venture Partners [2][5] - The company focuses on the independent research and development of multimodal large models and applications, with its core product Vidu capable of AI image, video, and audio generation [5][6] Company Overview - Shengshu Technology was established in March 2023, with a core team from top global universities and industry professionals, showcasing strong industry experience and global technology implementation capabilities [5] - Vidu has rapidly gained traction, covering over 30 million users and 6,000 developers and enterprises across more than 200 countries and regions, generating over 400 million videos [5][6] Market Potential - The CEO of Shengshu Technology, Dr. Luo Yihang, indicated that the commercialization of multimodal generation technology in the digital content industry is accelerating, with significant market space and global growth potential expected in the next three years [6] - The new round of financing will be used for model research and technological innovation, as well as to enhance product expansion, user service, industry collaboration, and global business layout [6] Product Development - Vidu launched globally in July 2024, introducing the concept of "reference life" images/videos and achieving key breakthroughs in consistency in commercial content creation [5][6] - The number of generated reference life videos and images has exceeded 100 million, with over 50% of the generated content being commercial material [5]
星动纪元招聘!具身多模态、强化学习等多个方向
具身智能之心· 2025-09-17 00:02
Core Viewpoint - The article outlines various job descriptions and requirements for positions related to multi-modal reinforcement learning, data processing, and embodied intelligence, emphasizing the need for advanced skills in AI and machine learning technologies [6][14][15]. Group 1: Job Descriptions - Responsibilities include research, design, and implementation of cutting-edge multi-modal reinforcement learning algorithms to address complex real-world problems [6]. - Involvement in the collection, processing, cleaning, and analysis of multi-modal data to create high-quality training datasets [14]. - Development and optimization of multi-modal models, including training, fine-tuning, and enhancing performance across different tasks [6][15]. Group 2: Job Requirements - Candidates should possess a master's degree or higher in computer science, artificial intelligence, or robotics, with at least one year of research experience in computer vision or embodied intelligence [13]. - Proficiency in programming languages such as Python and deep learning frameworks like PyTorch is essential, along with strong engineering implementation skills [13]. - Experience in publishing papers at top academic conferences (e.g., CVPR, NeurIPS) and contributions to open-source projects are preferred [13][19]. Group 3: Additional Qualifications - Familiarity with multi-modal data cleaning, labeling, and loading, as well as understanding data optimization techniques is required [14]. - Candidates should have experience with large language models and multi-modal models, including knowledge of their capabilities and applicable scenarios [14]. - High standards for data quality and attention to detail are necessary, along with proficiency in data processing tools like Pandas and NumPy [14].
大模型初创公司出海,云计算护航丨创新场景
Tai Mei Ti A P P· 2025-09-16 09:42
Core Insights - The launch of Sora has positioned the AI video generation sector as a focal point in the global AI landscape, attracting significant attention from capital and media [3] - Aishi Technology has rapidly developed its video model, PixVerse, which has become one of the largest and fastest video generation models globally, surpassing 60 million users in just two years [3][4] - The company faces challenges in technology iteration and global expansion, particularly in managing dispersed data and complying with local regulations [3][4][5] Group 1: Technology and Product Development - Aishi Technology has released six iterations of its video model, PixVerse, focusing on enhancing user experience and generation speed [3][7] - The company aims to lower the psychological barriers for users to create videos by leveraging AI technology [4] - The multi-modal video model requires advanced GPU capabilities and efficient real-time data processing to meet user demands [4][6][7] Group 2: Global Expansion and Data Management - Aishi Technology's global operations necessitate the aggregation and management of vast amounts of data across different regions, posing challenges in data migration and cost [5][6] - The partnership with Alibaba Cloud is aimed at addressing these challenges by utilizing its extensive global cloud service network [9][10] - The collaboration includes optimizing cross-region data transfer and enhancing data processing capabilities through advanced cloud solutions [9][10] Group 3: Cost Efficiency and Resource Utilization - Aishi Technology seeks to optimize cloud computing costs while maintaining high performance and resource utilization [7][12] - The company has transitioned to using Alibaba Cloud's Hologres for real-time data analysis, which supports large-scale data processing [9][10] - The deployment of CADT (Cloud Speed Deployment) has significantly reduced the time and complexity involved in managing cloud applications [14] Group 4: Future Collaboration and Growth - Aishi Technology plans to deepen its collaboration with Alibaba Cloud to enhance service stability and efficiency for its global AI video generation users [15] - The partnership will expand across various domains, including cloud computing, data storage, and large model applications, to drive the continuous development of AI video generation technology [15]
登顶苹果应用榜!谷歌火遍全网的“纳米香蕉”,凭啥击败ChatGPT?
Zheng Quan Shi Bao· 2025-09-16 07:54
Core Insights - Google's market capitalization has reached $3 trillion, and its AI application Gemini has surpassed ChatGPT to become the top free app in the Apple App Store [1] - Gemini has also topped the charts in countries like Canada, India, and Morocco, breaking ChatGPT's long-standing dominance since its launch [1] Group 1: Product Performance - Gemini's download numbers have exceeded those of ChatGPT, marking a significant shift in the competitive landscape of AI applications [1] - The success of Gemini is attributed to the launch of the image editing product Nano Banana, which has seen over 200 million image edits and attracted over 10 million new users since its release [2][3] Group 2: Technological Advancements - Nano Banana features several technological improvements over previous multimodal models, including natural language-driven image editing, character consistency, multi-image fusion, and reduced barriers for 3D modeling [3][8] - The model allows users to perform precise edits using simple natural language commands, enhancing user experience and accessibility [3] Group 3: Market Impact - The positive market response to Nano Banana and favorable antitrust rulings have contributed to a rise in Google's stock price, with analysts increasing Alphabet's target price from $225 to $280 [7] - The success of Nano Banana has sparked competition in the image generation space, with other companies like ByteDance and Shengshu Technology launching similar models [8][9] Group 4: Investment Opportunities - The shift towards multimodal models is expected to create investment opportunities in both computational power and application sectors, as the demand for video reasoning capabilities is significantly higher than for text [9] - The commercial viability of multimodal products is anticipated to outpace that of text-based products, indicating a pivotal moment in the development of AI applications [9]
登顶苹果应用榜!谷歌火遍全网的“纳米香蕉”,凭啥击败ChatGPT?
证券时报· 2025-09-16 07:51
Core Viewpoint - Google's market capitalization has reached $3 trillion, and its AI application Gemini has surpassed ChatGPT to become the top app on the Apple App Store [1][2]. Group 1: Gemini's Performance - Gemini has achieved over 2 million downloads in the US App Store, surpassing ChatGPT, and has also topped the charts in Canada, India, and Morocco [2]. - The success of Gemini is attributed to the launch of the image editing product Nano Banana, which has significantly improved image quality and editing control [4]. Group 2: Nano Banana Features - Nano Banana allows users to edit images using simple natural language commands, eliminating the need for traditional editing tools [4]. - The model maintains character consistency across different scenes and actions, which is crucial for brand character creation and script generation [4]. - It supports the fusion of multiple images and incorporates world knowledge to understand complex scenes for editing tasks [5]. - Nano Banana reduces the barriers to 3D modeling by generating 2D designs that include essential structural and material information [5]. Group 3: Market Impact and Competitors - The popularity of Nano Banana has sparked competition in the image generation space, with other companies like ByteDance and Shengshu Technology launching similar models [10]. - Analysts believe that the native multimodal model architecture is gaining industry recognition, with OpenAI and Google's models showing advantages in performance and deployment [10]. - The demand for computational power is expected to increase due to the higher requirements of native multimodal models compared to non-native ones [11].
明略科技CEO吴明辉即将出席2025腾讯全球数字生态大会
Xin Lang Cai Jing· 2025-09-16 03:14
Core Insights - The evolution of global large model technology is accelerating, with industry applications deepening progressively [1] - Vertical large models are becoming the key to the implementation of AI in enterprises, addressing the limitations of general large models in proprietary data and industry know-how [1] - Minglue Technology's CEO, Wu Minghui, will present at the Tencent Global Digital Ecosystem Conference, discussing the practical applications of multimodal large models in marketing scenarios [1] Industry Trends - The shift towards vertical large models indicates a growing recognition of their importance in overcoming challenges faced by general large models [1] - The focus on industry-specific applications suggests a trend towards more tailored AI solutions that leverage specialized knowledge and data [1] Company Developments - Minglue Technology is showcasing its latest technological breakthroughs and practical achievements in the field of AI [1] - The upcoming presentation at a major conference highlights the company's commitment to advancing AI applications in marketing [1]
论文解读之港科PLUTO:首次超越Rule-Based的规划器!
自动驾驶之心· 2025-09-15 23:33
Core Viewpoint - The article discusses the development and features of the PLUTO model within the end-to-end autonomous driving domain, emphasizing its unique two-stage architecture and its direct encoding of structured perception outputs for downstream control tasks [1][2]. Summary by Sections Overview of PLUTO - PLUTO is characterized by its three main losses: regression loss, classification loss, and imitation learning loss, which collectively contribute to the model's performance [7]. - Additional auxiliary losses are incorporated to aid model convergence [9]. Course Introduction - The article introduces a new course titled "End-to-End and VLA Autonomous Driving," developed in collaboration with top algorithm experts from domestic leading manufacturers, aimed at addressing the challenges faced by learners in this rapidly evolving field [12][15]. Learning Challenges - The course addresses the difficulties learners face due to the fast-paced development of technology and the fragmented nature of knowledge across various domains, making it hard for beginners to grasp the necessary concepts [13]. Course Features - The course is designed to provide quick entry into the field, build a framework for research capabilities, and combine theory with practical applications [15][16][17]. Course Outline - The course consists of several chapters covering topics such as the history and evolution of end-to-end algorithms, background knowledge on various technologies, and detailed discussions on both one-stage and two-stage end-to-end methods [20][21][22][29]. Practical Application - The course includes practical assignments, such as RLHF fine-tuning, allowing students to apply their theoretical knowledge in real-world scenarios [31]. Instructor Background - The instructor, Jason, has a strong academic and practical background in cutting-edge algorithms related to end-to-end and large models, contributing to the course's credibility [32]. Target Audience and Expected Outcomes - The course is aimed at individuals with a foundational understanding of autonomous driving and related technologies, with the goal of elevating their skills to the level of an end-to-end autonomous driving algorithm engineer within a year [36].
关于大模型和自动驾驶的一切
自动驾驶之心· 2025-09-15 23:33
Group 1 - The article emphasizes the growing interest in large model technologies, particularly in areas such as RAG (Retrieval-Augmented Generation), AI Agents, multimodal large models (pre-training, fine-tuning, reinforcement learning), and optimization for deployment and inference [1] - A community named "Large Model Heart Tech" is being established to focus on these technologies and aims to become the largest domestic community for large model technology [1] - The community is also creating a knowledge platform to provide industry and academic information, as well as to cultivate talent in the field of large models [1] Group 2 - The article describes the community as a serious content-driven platform aimed at nurturing future leaders [2]