替代数据

Search documents
新加坡媒体:美劳工统计局局长被解雇后,美政府数据真实性遭质疑
Huan Qiu Shi Bao· 2025-08-31 23:02
Core Viewpoint - The article discusses the erosion of trust in U.S. economic data due to actions taken by the Trump administration, including the dismissal of key officials and the undermining of independent statistical agencies [1][2][3]. Group 1: Impact on Economic Data - The U.S. Department of Labor's employment data, crucial for assessing the economy's health, was reported to be significantly weaker than expected, leading to President Trump's dismissal of the Labor Statistics Bureau chief [1]. - Trump's appointment of a loyalist to the Labor Statistics Bureau raises concerns about the independence and quality of economic data, as the new appointee has previously suggested halting employment data releases [2]. - The government's budget cuts have led to the disappearance of hundreds of data sets and over 8,000 government web pages, which are essential for public policy and economic analysis [3]. Group 2: Alternative Data Sources - Some institutional investors have begun using alternative data, such as satellite imagery, to gain insights into economic performance, indicating a shift in how market participants assess economic conditions [4]. - The reliance on alternative data raises concerns about market fairness, as access to such data is often limited to wealthier investors, creating disparities in information availability [4]. - While advancements in technology are making alternative data more accessible, it is still years away from fully replacing traditional economic data collection methods [4].
Jinqiu Select | Physical Intelligence 联创:AI训练的真实数据不可替代
锦秋集· 2025-07-22 15:04
Core Viewpoint - Over-reliance on alternative data sources can severely limit the ultimate capabilities of models, and true breakthroughs must be built on real data [1][10] Group 1: The Dilemma of Alternative Data - Researchers in robotics often seek cheaper alternatives to real data due to high collection costs, leading to a compromise in model performance [2][3] - Common alternative methods include simulation training, learning from human videos, and using handheld devices to mimic robotic actions, but each method ultimately weakens the model's true potential [3][4] Group 2: Intersection Dilemma - The collection of data inevitably involves human judgment, which can limit the problem-solving approach when avoiding real data [4][6] - As models grow stronger, they can better distinguish between alternative and real data, leading to a smaller intersection of effective behaviors [6][7] Group 3: The Importance of Real Data - Attempting to bypass real data results in a "spork" scenario, where neither alternative data nor real data is effectively utilized [10][11] - To build robust robotic models that generalize well, real data is essential, but it can be complemented with diverse data sources [11][12] Group 4: The "Spork" Phenomenon - The concept of "spork" applies to various AI research areas, where attempts to combine manual design with learning systems ultimately create performance bottlenecks [13]
关于机器人数据,强化学习大佬Sergey Levine刚刚写了篇好文章
机器之心· 2025-07-22 04:25
机器之心报道 机器之心编辑部 我们知道,训练大模型本就极具挑战,而随着模型规模的扩大与应用领域的拓展,难度也在不断增加,所需的数据更是海量。 大型语言模型(LLM)主要依赖大量文本数据,视觉语言模型(VLM)则需要同时包含文本与图像的数据,而在机器人领域,视觉 - 语言 - 行动模型(VLA)则 要求大量真实世界中机器人执行任务的数据。 目前而言,Agent 是我们走向通用人工智能(AGI)的重要过渡。训练 Agent 则需要带有行动标签的真实交互数据,而获取这类数据的成本远比从网页上获取文本 与图像的成本高昂得多。 因此,研究者一直在尝试寻找一种替代方案,来实现鱼和熊掌兼得的效果:既能够降低数据获取成本,又能够保证大模型训练成果,保持基础模型训练中常见的 大规模数据带来的优势。 加州大学伯克利分校副教授,Physical Intelligence 的联合创始人,强化学习领域大牛 Sergey Levine 为此撰写了一篇文章,分析了训练大模型的数据组合,但他却 认为,鱼和熊掌不可兼得,叉子和勺子组合成的「叉勺」确实很难在通用场景称得上好用。 替代数据 尽管在视觉感知和自然语言处理任务中,真实世界数据一直被视 ...