数据工程
Search documents
林镇阳:数据是AI的养料,数智融合是必然趋势
Xin Lang Cai Jing· 2025-12-10 09:16
Core Insights - The "2025 China Enterprise Competitiveness Conference" highlighted the extension of the "smile curve" theory from industrial manufacturing to the data factor market, indicating that the highest added value is found at both ends of the value chain [2][5] - AI technology is reshaping the data value chain, with diminishing value in data transmission and simple processing, while emphasizing the importance of high-quality data supply and innovative application scenarios [2][5] - The traditional data supply model has been restructured, driven by the proliferation of computing power, which shifts the upstream data industry from "quantity" to "quality" [2][5] Data Supply and Application - The upstream focuses on high-quality data collection and governance, while the downstream emphasizes scenario monetization and value realization [2][5] - Technologies such as trusted data spaces are addressing challenges in data circulation and control, leading to the emergence of diverse intelligent integration scenarios like model factories and MaaS services [2][5] Data Engineering and AI Integration - Data engineering is identified as the core of future R&D for AI companies, information technology firms, and traditional manufacturing enterprises [2][5] - The integration of business, AI, and data is essential for releasing data value, transitioning from traditional data capabilities to AI-driven data construction [2][5] Intelligent Transformation Logic - The core logic of enterprise digital transformation is to build knowledge engineering based on data, utilizing AI large model training to supply high-quality datasets [6] - This process aims to create a business closed loop of "data - model - application - data," facilitating iterative improvements in data engineering [6] - The transformation not only reduces costs and enhances efficiency for enterprises but also supports the high-quality construction of data assets in the zero-level market of the data factor market [6]
哲学就业意外火了
投资界· 2025-09-11 08:44
Core Viewpoint - The perception of computer science (CS) as a "safe career choice" is rapidly deteriorating, with rising unemployment rates among graduates indicating a significant shift in the job market [2][3][10]. Group 1: Employment Trends - The unemployment rate for CS graduates has surged to 6.1%, nearly double that of philosophy graduates, while computer engineering graduates face an even higher rate of 7.5% [5][6]. - Graduates from top institutions like MIT and Stanford have seen their employment rates drop from 80% to 70% within two years, with the proportion entering major tech companies plummeting from 25% to 11-12% [7][10]. - The tech industry has experienced massive layoffs, with over 151,000 jobs cut in 2024 and an additional 22,000 in early 2025, affecting major companies like Google and Microsoft [14][15]. Group 2: Educational Challenges - The number of CS degrees awarded has more than doubled over the past decade, from 51,000 to 112,000, leading to a surplus of graduates compared to available job opportunities [16][17]. - Many universities continue to focus on outdated curricula, failing to keep pace with the evolving job market, which has resulted in a mismatch between graduate skills and industry needs [16][17]. Group 3: Market Dynamics - Companies are increasingly prioritizing skills and project experience over formal education, with 52% of job postings in early 2024 not requiring any formal education [20][23]. - The demand for traditional programming roles is declining, while positions in AI, machine learning, data engineering, and cybersecurity are on the rise, with growth rates significantly exceeding average values [24]. Group 4: Adaptation Strategies - Students are encouraged to build portfolios and gain practical experience, as employers are more interested in problem-solving abilities than academic performance [27][28]. - Participation in hackathons and real-world projects can provide valuable experience and improve employability, as demonstrated by successful case studies of graduates [27][28].
X万字解读具身智能数据工程 | Jinqiu Select
锦秋集· 2025-08-07 15:02
Core Insights - The article discusses the limitations of embodied artificial intelligence (EAI) data engineering, highlighting the challenges posed by data bottlenecks in the field, particularly in cost efficiency, data silos, and evaluation vacuums [1][5][25]. - A comprehensive framework for EAI data engineering is proposed, aiming to create a systematic, standardized, and scalable data solution for embodied intelligent systems [1][8][10]. Group 1: Data Bottlenecks - The three core data bottlenecks identified are low cost efficiency, data silos, and evaluation vacuums [25][26][28]. - The current data available for EAI is significantly less than that for large language models, with only one ten-thousandth of the data volume [6][25]. - The unique nature of EAI data requires capturing the spatiotemporal relationships between agent behavior and environmental changes, complicating data acquisition [6][25]. Group 2: Current Data Production Challenges - Existing data production methods for EAI are fragmented and unsustainable, leading to inconsistencies in data quality and generalizability [7][25]. - A shift towards a systematic engineering approach is necessary to design new data production pipelines, making data engineering a foundational aspect of scalable EAI [7][10]. Group 3: EAI Data Engineering Framework - The proposed EAI data engineering framework encompasses the entire data production lifecycle, focusing on standardization to ensure high-quality, reliable multimodal datasets [8][10]. - Key components of the framework include top-level design of data production systems, establishment of unified data standards, and development of technologies for real-world data collection and simulation data generation [10][11]. Group 4: Data Collection Techniques - Real-world data collection systems are categorized into tele-operated data collection systems and teach-based data collection systems, each with distinct operational architectures [29][30]. - Tele-operated systems involve remote control of robots, while teach-based systems record human teaching actions to guide robots [29][30][33]. Group 5: Standardization of EAI Data - Standardization of EAI data is crucial for addressing data silos and evaluation vacuums, facilitating interoperability and quality assessment across diverse datasets [44][68]. - The article outlines the classification of EAI datasets, including demonstration datasets and embodied question-answering datasets, which are essential for training EAI models [45][56]. Group 6: Future Directions - The framework anticipates applications in various industries, including manufacturing and services, and emphasizes the need for continuous optimization of data quality and cost reduction [1][10].