Core Viewpoint - The year 2025 is anticipated to be remembered as the dawn of the AI industrial era, with many companies racing to invest in AI applications and agent development, but the true competition lies beyond just application-level advancements [1][4]. Group 1: AI Infrastructure and Data Management - The AI era emphasizes that the foundation for AI applications is robust data infrastructure, which is crucial for building true competitive advantages for companies [3][8]. - Companies need to develop capabilities to handle multimodal data, as the real benefits of the AI era lie not in merely possessing state-of-the-art models but in the ability to continuously manage and nurture them [9][18]. - The industry is entering the "second half" of AI, where the focus shifts to how AI should be utilized and how to measure real progress, necessitating a change in mindset to leverage AI thinking [4][5]. Group 2: Multimodal Data Lakes - The construction of multimodal data lakes is becoming essential for companies to participate in the agent competition, as it allows for the transformation of previously dormant unstructured data into usable competitive assets [14][21]. - IDC predicts that by 2025, over 80% of enterprise data will be unstructured, highlighting the need to awaken this data to build competitive strength in the agent era [16][19]. - The transition from traditional data lakes to multimodal data lakes is critical, as it enables companies to manage and utilize diverse data types effectively, driving business intelligence and operational efficiency [12][22]. Group 3: Data Infrastructure Evolution - The evolution of data infrastructure is outlined in three progressive stages: overcoming computing bottlenecks, integrating models into data pipelines, and implementing comprehensive data governance [30][31][33]. - The first stage focuses on breaking through computing limitations by adopting heterogeneous architectures that support both CPU and GPU, ensuring data can be processed quickly and efficiently [30]. - The second stage emphasizes the integration of pre-trained large models into data workflows, allowing for the automatic conversion of multimodal data into usable formats for AI applications [31][32]. - The final stage aims for unified data governance, enhancing the management and activation of data assets while ensuring compliance and security [33][34]. Group 4: Strategic Recommendations for Companies - Companies should prioritize transforming their data infrastructure from a "storage center" to a "value center," ensuring that data can be quickly accessed and understood by AI models [38][39]. - The focus should be on practical business applications, avoiding the pitfalls of excessive computational power that does not translate into business value [40][41]. - A modular and open data infrastructure is essential for adapting to future uncertainties, allowing companies to upgrade smoothly as technologies evolve [43][44][45]. Group 5: Industry Applications and Impact - The implementation of multimodal data lakes has shown significant improvements across various industries, such as a 20-fold performance increase in a smart driving company's model training and a 90% efficiency boost in content production for a leading media company [51][59]. - These examples illustrate the necessity of adopting multimodal data strategies to unlock the potential for intelligent transformation across diverse sectors [52][56].
Agent时代,为什么多模态数据湖是必选项?
机器之心·2026-01-15 00:53