Workflow
数据工程
icon
Search documents
林镇阳:数据是AI的养料,数智融合是必然趋势
Xin Lang Cai Jing· 2025-12-10 09:16
专题:2025中国企业竞争力年会 责任编辑:李思阳 专题:2025中国企业竞争力年会 "2025中国企业竞争力年会"于12月9日至10日在北京举行。国家科学技术进步奖获得者,软通智慧数据 要素首席科学家、副总裁林镇阳在演讲中提出,将工业制造业的 "微笑曲线" 理论延伸至数据要素市 场,可发现价值链两端的附加值最高。 在他看来,一端是数据供给侧的高质量数据采集与治理,另一端是应用侧的场景变现与价值实现。"AI 技术正在重塑数据价值链,中间的数据传输与简单加工价值持续降低,而上游高质量数据集供给和下游 数智融合场景创新的重要性愈发凸显。" 他表示,传统数据供给模式已被重构,算力普及推动上游数据产业从 "量" 到 "质" 的转变,中游可信数 据空间等技术破解了数据流通的可信与管控难题,下游则涌现出模型工厂、MaaS 服务等多元数智融合 场景,未来 AI 将驱动数据实现自循环生态闭环。 从企业微观视角出发,林镇阳指出,无论 AI 型厂商、信息化企业还是传统制造企业,数据工程都是未 来研发的核心。"数据是 AI 的养料,AI 是数据价值的驱动力。" 他强调,数智融合是必然趋势,数据不仅是高质量发展的核心引擎,更是衡量发 ...
哲学就业意外火了
投资界· 2025-09-11 08:44
Core Viewpoint - The perception of computer science (CS) as a "safe career choice" is rapidly deteriorating, with rising unemployment rates among graduates indicating a significant shift in the job market [2][3][10]. Group 1: Employment Trends - The unemployment rate for CS graduates has surged to 6.1%, nearly double that of philosophy graduates, while computer engineering graduates face an even higher rate of 7.5% [5][6]. - Graduates from top institutions like MIT and Stanford have seen their employment rates drop from 80% to 70% within two years, with the proportion entering major tech companies plummeting from 25% to 11-12% [7][10]. - The tech industry has experienced massive layoffs, with over 151,000 jobs cut in 2024 and an additional 22,000 in early 2025, affecting major companies like Google and Microsoft [14][15]. Group 2: Educational Challenges - The number of CS degrees awarded has more than doubled over the past decade, from 51,000 to 112,000, leading to a surplus of graduates compared to available job opportunities [16][17]. - Many universities continue to focus on outdated curricula, failing to keep pace with the evolving job market, which has resulted in a mismatch between graduate skills and industry needs [16][17]. Group 3: Market Dynamics - Companies are increasingly prioritizing skills and project experience over formal education, with 52% of job postings in early 2024 not requiring any formal education [20][23]. - The demand for traditional programming roles is declining, while positions in AI, machine learning, data engineering, and cybersecurity are on the rise, with growth rates significantly exceeding average values [24]. Group 4: Adaptation Strategies - Students are encouraged to build portfolios and gain practical experience, as employers are more interested in problem-solving abilities than academic performance [27][28]. - Participation in hackathons and real-world projects can provide valuable experience and improve employability, as demonstrated by successful case studies of graduates [27][28].
X万字解读具身智能数据工程 | Jinqiu Select
锦秋集· 2025-08-07 15:02
Core Insights - The article discusses the limitations of embodied artificial intelligence (EAI) data engineering, highlighting the challenges posed by data bottlenecks in the field, particularly in cost efficiency, data silos, and evaluation vacuums [1][5][25]. - A comprehensive framework for EAI data engineering is proposed, aiming to create a systematic, standardized, and scalable data solution for embodied intelligent systems [1][8][10]. Group 1: Data Bottlenecks - The three core data bottlenecks identified are low cost efficiency, data silos, and evaluation vacuums [25][26][28]. - The current data available for EAI is significantly less than that for large language models, with only one ten-thousandth of the data volume [6][25]. - The unique nature of EAI data requires capturing the spatiotemporal relationships between agent behavior and environmental changes, complicating data acquisition [6][25]. Group 2: Current Data Production Challenges - Existing data production methods for EAI are fragmented and unsustainable, leading to inconsistencies in data quality and generalizability [7][25]. - A shift towards a systematic engineering approach is necessary to design new data production pipelines, making data engineering a foundational aspect of scalable EAI [7][10]. Group 3: EAI Data Engineering Framework - The proposed EAI data engineering framework encompasses the entire data production lifecycle, focusing on standardization to ensure high-quality, reliable multimodal datasets [8][10]. - Key components of the framework include top-level design of data production systems, establishment of unified data standards, and development of technologies for real-world data collection and simulation data generation [10][11]. Group 4: Data Collection Techniques - Real-world data collection systems are categorized into tele-operated data collection systems and teach-based data collection systems, each with distinct operational architectures [29][30]. - Tele-operated systems involve remote control of robots, while teach-based systems record human teaching actions to guide robots [29][30][33]. Group 5: Standardization of EAI Data - Standardization of EAI data is crucial for addressing data silos and evaluation vacuums, facilitating interoperability and quality assessment across diverse datasets [44][68]. - The article outlines the classification of EAI datasets, including demonstration datasets and embodied question-answering datasets, which are essential for training EAI models [45][56]. Group 6: Future Directions - The framework anticipates applications in various industries, including manufacturing and services, and emphasizes the need for continuous optimization of data quality and cost reduction [1][10].