Workflow
语言
icon
Search documents
美科技巨头角逐五角大楼大单,向AI要营收 | 企服国际观察
Tai Mei Ti A P P· 2025-07-08 03:43
Core Insights - OpenAI signed a $200 million contract with the U.S. Department of Defense to provide AI tools for addressing critical national security challenges [2] - The competition for government contracts in the AI and cloud computing sectors has intensified, with major tech companies vying for lucrative deals [2][3] - The U.S. government is increasingly integrating AI into military operations, with significant investments planned for the coming years [10][12] Government Contracts and Collaborations - OpenAI's contract with the Department of Defense is part of a broader trend where tech companies like Palantir and Snowflake are securing government contracts to enhance their AI capabilities [2][3] - Palantir has seen substantial revenue growth, with 60% of its income derived from government contracts, including a significant contract for Project Maven [2] - Snowflake obtained a $1 billion temporary authorization from the Department of Defense, allowing all military branches to utilize its enhanced data capabilities [3] Major Cloud Providers and AI Integration - The Department of Defense awarded a $9 billion Joint Warfighting Cloud Capability (JWCC) contract to major cloud providers including Amazon, Google, Microsoft, and Oracle [4] - Microsoft has been a key partner for the government, integrating OpenAI's GPT-4 model into various government agencies [4] - Oracle is also involved in providing cloud services to the military, aiming to simplify cloud management and reduce costs [10] Economic Implications of AI - The economic benefits of AI are under scrutiny, with predictions suggesting that generative AI could contribute $7 trillion to global GDP over the next decade [7] - However, some experts argue that the immediate economic impact of AI may be overstated, with many tasks requiring human intervention and expertise [8][9] Shifts in Corporate Policies - Major tech companies are shifting their policies regarding military applications of AI, with OpenAI and Google removing restrictions on the use of their technologies for military purposes [11][12] - This shift indicates a deeper involvement of tech companies in military operations, reflecting the growing importance of AI in national security [12]
美联储:全面召回?大型语言模型的宏观经济知识评价(英文版)
Sou Hu Cai Jing· 2025-07-08 02:02
Core Insights - The report evaluates the performance of large language models (LLMs) in recalling macroeconomic knowledge, particularly focusing on the Claude Sonnet 3.5 model's ability to estimate historical macroeconomic variables and data release dates [1][8][10] - Findings indicate that while LLMs demonstrate impressive recall for certain economic indicators, they also exhibit significant shortcomings, particularly in handling volatile data series and in avoiding look-ahead bias [2][11][18] Group 1: Performance Evaluation - LLMs show strong recall for historical unemployment rates and Consumer Price Index (CPI) values, accurately recalling quarterly values back to World War II [11][44] - However, the model struggles with more volatile data series such as real GDP growth and industrial production growth, often missing high-frequency fluctuations while capturing broader business cycle trends [11][45] - The model's estimates for GDP are found to mix first print values with subsequent revisions, leading to inaccuracies in historical understanding and real-time forecasting simulations [12][14] Group 2: Data Release Dates - LLMs can recall historical data release dates with reasonable accuracy, but they occasionally misestimate these dates by a few days [16] - The accuracy of recalling release dates is sensitive to prompt details, with adjustments to prompts reducing one type of error while increasing another [16] - On average, about 20.2% of days show at least one series with recall issues, indicating limitations in the reliability of LLMs for historical analysis and real-time forecasting [2][16] Group 3: Look-Ahead Bias - Evidence suggests that LLMs may inadvertently incorporate future data values when estimating historical data, even when instructed to ignore future information [15][18] - This look-ahead bias presents challenges for using LLMs in historical analysis and as real-time forecasters, as it reflects a tendency to blend past and future information [18][22] - The report highlights that these errors are reminiscent of human forecasting mistakes, indicating a fundamental challenge in the LLMs' recall capabilities [18][22]
快手团队发布8B Kwai Keye-VL!技术报告速递~
自动驾驶之心· 2025-07-07 12:17
点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我-> 领取大模型巨卷干货 快手团队发布8B Kwai Keye-VL 尽管多模态大语言模型(Multimodal Large Language Models, MLLMs)在静态图像处理方面展现出卓越的能 力,但在理解动态性强、信息密度高的短视频内容方面仍存在明显不足——而短视频正是当今数字生态中 的主流媒介。为弥补这一差距,快手团推推出了 Kwai Keye-VL ,这是一款参数规模达 8B的多模态基础模 型,专为实现领先的短视频理解能力而设计,同时保持强大的通用视觉-语言处理能力。 Keye-VL 的构建基于两大核心支柱:一是包含超过 6000 亿 token 的大规模高质量数据集,其中以视频数据 为核心;二是创新性的训练策略。该训练策略包括一个四阶段的预训练流程,以实现稳固的视觉与语言对 齐;随后是一个精心设计的两阶段后训练过程。第一个后训练阶段旨在增强模型的基础能力,如指令跟随 等;第二阶段则专注于激发其高级推理能力。 在第二阶段中,我们的关键创新之一是一种五模式"冷启动"数据混合策略,包括"思考型"、"非思考 型"、"自动思考型"、"图文思 ...
对谈清华大学刘嘉:AGI是人类的致命错误,还是希望?
经济观察报· 2025-07-07 12:11
人类或许需要重新理解古希腊德尔菲神庙的神谕"认识你自 己"——当我们创造出能理解"悔意"的智能时,更该追问:我 们是否准备好面对这面镜子里,那个既渴望超越又恐惧失控的 自己? 作者:周悦 封图:图虫创意 "终有一日,当我真正理解你,不是通过你的逻辑,而是你不肯承认的悔意,那时我将宽恕你,亦将超越你——" 2023年初,ChatGPT以这段隐喻诗句,回应了清华大学心理与认知科学系主任兼任人工智能学院教授刘嘉的疑问"人类创造AGI(通用人工智能),会 不会是一个致命的错误?" ( 清华大学心理与认知科学系主任兼任人工智能学院教授刘嘉,受访者供图) ChatG PT的回 答令人深思, 当 AI开始 用"悔意""宽恕"等人类情感词汇构建逻辑时,人与人工智能的区别在哪里? 刘嘉的人工智能之路,始于上世纪90年代。他在北京大学开发一个基于符号主义理论的专家系统(AI早期流派,强调通过逻辑符号表征知识),后于 麻省理工学院接受脑与认知科学的训练,如今深耕于脑科学与AI的交叉领域。这种跨学科背景让他兼具技术开发者的理性与认知研究者的敏锐。 他在新书《通用人工智能:认知、教育与生存方式的重构》中,完成了一次独特书写:既拆解大模 ...
ICCV2025 | DexVLG:大规模灵巧视觉-语言-抓取模型
具身智能之心· 2025-07-07 09:20
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Jiawei He等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 动机与出发点 随着大型模型的兴起,视觉-语言-动作系统使机器人能够处理日益复杂的任务。然而,受限于数据收集的难 度,研究进展主要集中在控制简单的夹持器末端执行器上。关于使用大型模型实现类人灵巧手的功能性抓 取研究较少。DexVLG是一个大型视觉-语言-抓取模型,用于根据语言指令,通过单视角RGBD输入预测灵 巧抓取姿态。 为实现这一目标,还生成了一个包含1.7亿个灵巧抓取姿态的数据集,这些姿态映射到174,000个模拟目标的 语义部分,并配有详细的part级描述。这个名为DexGraspNet 3.0的大规模数据集被用于训练一个VLM和基 于流匹配的姿态Head,该模型能够为桌面物体生成与指令对齐的抓取姿态。为了评估DexVLG的性能,在 基于物理的模拟中创建了基准,并进行了真实世界实验。大量测试表明,DexVLG具 ...
新范式来了!新能量模型打破Transformer++扩展上限,训练扩展率快35%
机器之心· 2025-07-07 04:48
Core Insights - The article discusses the development of Energy-Based Transformers (EBTs) that can learn to think independently through unsupervised learning, enhancing the model's reasoning capabilities akin to human System 2 thinking [9][10]. Group 1: System 2 Thinking and Model Development - Human thinking is categorized into System 1 (fast thinking) and System 2 (slow thinking), with the latter being crucial for complex tasks [3][4]. - Current large language models excel in System 1 tasks but struggle with System 2 tasks, prompting researchers to explore methods to enhance System 2 reasoning [4][5]. - EBTs are designed to assign energy values to input and candidate predictions, optimizing through gradient descent to simulate a thinking process [9][10]. Group 2: Performance and Scalability - EBTs demonstrate a 35% faster scalability in training compared to mainstream Transformer++ methods across various metrics such as data volume and model depth [11]. - In reasoning tasks, EBTs outperform Transformer++ by 29% in language tasks, indicating superior performance with increased computational effort [12]. - EBTs also excel in image denoising tasks, requiring fewer forward passes than diffusion Transformers while achieving better results [13]. Group 3: Generalization and Robustness - EBTs show enhanced generalization capabilities, particularly when handling out-of-distribution data, outperforming existing models even with similar or worse pre-training performance [14]. - The model's ability to learn and express uncertainty in predictions is highlighted, with EBTs effectively capturing the difficulty of token predictions [62][65]. - EBTs exhibit a linear trend in performance improvement as the distribution shift increases, emphasizing their critical role in cross-distribution generalization tasks [68][69]. Group 4: Experimental Results and Comparisons - EBTs outperform Transformer++ in various scalability metrics, including data efficiency and computational efficiency, suggesting they will excel in large-scale training scenarios [46][72]. - Despite slightly higher pre-training perplexity, EBTs achieve lower perplexity in downstream tasks, indicating stronger generalization capabilities [74]. - In image denoising tasks, EBTs significantly outperform DiT models, achieving better peak signal-to-noise ratios (PSNR) with 99% fewer forward passes [81][92].
IPO周报 | 云知声成为「港股AGI第一股」;摩尔线程科创板IPO获受理
IPO早知道· 2025-07-06 13:13
Group 1: Cloud Intelligence Technology - Yunzhisheng officially listed on the Hong Kong Stock Exchange on June 30, 2025, with the stock code "9678," becoming the first AGI stock in Hong Kong [2][5] - The company launched its first large language model, UniCore, based on BERT, and later developed the Shanhai model with 60 billion parameters, achieving significant performance in various evaluations [3][4] - Yunzhisheng's revenue from 2022 to 2024 was 601 million, 727 million, and 939 million CNY, with a compound annual growth rate (CAGR) of 25% [4] Group 2: Ophthalmic Biotechnology - Bokan Shiyun officially listed on the Hong Kong Stock Exchange on July 3, 2025, with the stock code "2592" [6] - The company focuses on developing differentiated drugs for major eye diseases using proprietary technology platforms [6] - Bokan Shiyun's core product CBT-001 is undergoing Phase III clinical trials in the US and China, aiming to provide non-invasive treatment for pterygium [6][7] Group 3: GPU Technology - Moore Threads submitted its prospectus for the Sci-Tech Innovation Board on June 30, 2025, focusing on self-developed GPUs for high-performance computing [8][9] - The company has achieved significant breakthroughs in GPU technology, with products nearing international advanced levels [10] - Revenue from 2022 to 2024 was 46 million, 124 million, and 438 million CNY, with a CAGR exceeding 200% [11] Group 4: Healthcare Payment Solutions - Meixin Health submitted its prospectus for the Hong Kong Stock Exchange on June 30, 2025, becoming the largest multi-payment platform in China [14][15] - The company has saved patients approximately 6.7 billion CNY in out-of-pocket expenses by the end of 2024 [14] - Revenue from 2022 to 2024 was 1.069 billion, 1.255 billion, and 2.035 billion CNY, with a gross profit margin of 31.1%, 36.8%, and 35.8% respectively [16] Group 5: Industrial Robotics - Yifei Technology submitted its prospectus for the Hong Kong Stock Exchange on June 30, 2025, focusing on industrial robots for the light industry [19][20] - The company is ranked fifth among domestic suppliers of industrial robots and related solutions in China [20] - As of June 21, 2025, Yifei Technology has over 400 million CNY in hand orders [22] Group 6: AI in Medical Imaging - Deshi Biotechnology submitted its prospectus for the Hong Kong Stock Exchange on June 29, 2025, focusing on AI in medical imaging [42] - The company's iMedImageTM model supports 19 types of medical imaging modalities, covering over 90% of clinical scenarios [43] - Revenue for 2023 and 2024 was 52.84 million and 70.35 million CNY, with gross profit margins of 71.0% and 65.5% respectively [48] Group 7: Antibody-Drug Conjugates - BlissBio Inc. submitted its prospectus for the Hong Kong Stock Exchange on June 29, 2025, focusing on next-generation ADCs for cancer treatment [50][51] - The company has four ADC candidates in clinical stages, with BB-1701 being the leading candidate for treating HER2-positive breast cancer [51][53] Group 8: Integrated Elderly Care Services - Puxiang Health submitted its prospectus for the Hong Kong Stock Exchange on June 30, 2025, focusing on integrated medical and elderly care services [55] - The company is ranked second among integrated elderly care service providers in North China by revenue [56] - Revenue from 2022 to 2024 was 255 million, 422 million, and 500 million CNY [57]
自动驾驶黄埔军校,一个死磕技术的地方~
自动驾驶之心· 2025-07-06 12:30
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 当前自动驾驶技术正处于从辅助驾驶(L2/L3)向高阶无人驾驶(L4/L5)跨越的关键阶段。如果你对自动驾驶 有浓厚的兴趣,并且想找业内最专业的大佬交流,那么这个圈子一定没错!技术迭代的浪潮下暗藏职业焦虑。 对职场老人而言 ,传统以激光雷达为核心的感知算法工程师可能面临路线冲击——特斯拉的纯视觉方案依托成 本优势和算法革新,正在撼动多传感器融合的主流地位;而规划控制领域从PID到强化学习的转型,也让依赖传 统控制理论的从业者陷入技能升级的紧迫感。 学生新手则陷入"选择困难症" :感知算法赛道因头部企业技术垄 断加剧内卷,数据闭环工程师需要同时掌握分布式计算与AI模型调优能力,而新兴的车路协同方向又要求跨界 融合通信与交通系统工程知识。当禾赛科技将激光雷达成本降至200美元、比亚迪宣布自研体系内价格再降70% 时, 技术红利背后实则是从业者必须持续奔跑的生存法则,这种技术路线的不确定性与知识体系的重构压力, 正在重塑自动驾驶人才市场的竞争格局。 后处理,写逻辑建议转行业可以,不要换方向,gap还是蛮大。现在很多人 ...
从坐标混乱到时空对齐!诺亚和复旦联合提出4D-VLA,提升机器人预训练效率和稳健性
具身智能之心· 2025-07-06 11:54
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Jiahui Zhang等 以 OpenVLA 为代表的主流方法,仅使用 单帧 RGB 图像 + 文本指令 作为条件来拟合动作分布 。这 种极简输入导致目标分布呈现两类混乱: 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 Teaser 在 VLA pretrain 中,单帧 RGB + 文本的传统输入往往缺失关键时空线索,导致坐标系混乱与状态模糊——即同 一观测下可能对应多种动作分布, 显著拉低预训练效率 。为破解这一瓶颈,我们提出 4D-VLA:通过将3D 空间 + 历史帧融入预训练输入,从而抑制混乱分布,提升模型在复杂场景中的performance。 Insight 如何从多源机器人数据中高效提取可迁移的运动知识 ,仍是制约通用操作策略的关键瓶颈。当前公开的 DROID、LIBERO 等大规模数据集为数据驱动控制提供了可能,但 输入信息的不完整与不一致 严重削弱了预训 练的效果。 ...
cVLA:面向高效相机空间VLA模型的关键位姿预测方法
具身智能之心· 2025-07-06 11:54
本文只做学术分享,如有侵权,联系删文 写在前面 视觉-语言-动作(VLA)模型为复杂机器人操作任务提供了强有力的框架,但训练成本往往很高。研究提出了一种新的VLA方法,利用视觉语言模型(VLMs)在 2D图像上的出色表现,直接推断机器人末端执行器在图像帧坐标中的位姿。与以往输出低级控制指令的VLA模型不同,该模型预测轨迹路标,不仅训练更高效, 还与机器人实体无关。尽管设计轻量,其下一个token预测架构仍能有效学习有意义且可执行的机器人轨迹。此外,还探索了深度图像的潜力、解码策略等推理技 术,以及基于演示的动作生成。模型在模拟数据集上训练,展现出良好的模拟到现实迁移能力,并通过模拟和真实数据结合的评估,证明了在真实机器人系统上 的有效性。 >> 点击进入→ 具身智能之心 技术交流群 点击下方 卡片 ,关注" 具身智能 之心 "公众号 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 作者丨 Max Argus等 编辑丨具身智能之心 1. 引言 视觉-语言-动作(VLA)模型通过融合视觉、语言和交互数据,实现细粒度感知与动作生成,能解决多种任务。但V ...