Workflow
量子位
icon
Search documents
ChatGPT内嵌App!OpenAI开发者日全览,Agent工具链+应用生态+模型API多箭齐发
量子位· 2025-10-07 04:43
Core Insights - OpenAI's Developer Day 2025 showcased a significant increase in product releases compared to previous years, indicating a rapid evolution in AI capabilities and offerings [1] Group 1: New Features and Tools - ChatGPT now integrates various applications, allowing users to interact with apps like Coursera and Spotify directly within the chat interface, enhancing user experience and accessibility [2][13] - The introduction of AgentKit provides developers with a comprehensive toolkit for building, deploying, and optimizing agents, featuring modules like Agent Builder and Connector Registry [4][23] - Codex, OpenAI's AI programming tool, has been upgraded with new functionalities, including Slack integration and Codex SDK, enabling seamless task delegation and integration into workflows [8][29] Group 2: Developer Support and SDKs - OpenAI has launched Apps SDK, allowing developers to create and test applications that can connect with ChatGPT, with plans for a submission and review process later this year [18][20] - The Agent Builder module within AgentKit allows developers to visually construct agents without starting from scratch, streamlining the development process [8][25] - The Connector Registry facilitates centralized management of data and tool connections across OpenAI products, enhancing interoperability [24][27] Group 3: Pricing and Model Comparisons - The API for GPT-5 Pro has been made available, with pricing set at $15 per million tokens for input and $120 for output, reflecting a premium positioning in the market [34][35] - A comparison of pricing shows GPT-5 Pro at $15, while other models like o3-pro are priced higher at $20, indicating competitive pricing strategies [38] - The introduction of a smaller, more cost-effective voice model, GPT-Realtime-Mini, offers similar performance at a 70% lower price, catering to budget-conscious developers [40]
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-07 04:43
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人工智能领域创新创业力量,将评选出最具投资价值和发展潜力的AI创业公司, 参选条件 : 评选标准 : 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 ...
OpenAI拿下10%股权,AMD一夜暴涨634亿美元
量子位· 2025-10-07 04:43
Core Viewpoint - OpenAI has entered into a strategic partnership with AMD, committing to deploy a total of 6GW of AMD GPU computing power over the coming years, with the first 1GW set to be deployed in the second half of 2026 [2][10]. Group 1: Partnership Details - OpenAI will deploy a total of 6GW of AMD GPU computing power, starting with 1GW in late 2026, and will gradually expand to cover multiple generations of AMD's Instinct products [2][10]. - AMD has granted OpenAI warrants to purchase up to 160 million shares at a price of $0.01 per share, potentially allowing OpenAI to acquire approximately 10% of AMD's equity if fully exercised [3][5][15]. - The exercise of these warrants is contingent upon specific milestones, including the completion of the first 1GW deployment and AMD achieving certain stock price targets [13][14]. Group 2: Market Impact - Following the announcement of the partnership, AMD's market capitalization surged from approximately $267.2 billion to $330.6 billion, with further increases pushing it above $340 billion [6]. - OpenAI's investment in AMD can be seen as a strategic move to reduce its reliance on NVIDIA, which has historically been its primary supplier for computing power [17][19]. - The partnership is expected to generate significant revenue for AMD, potentially amounting to hundreds of billions, while also allowing AMD to capture a larger share of the AI chip market [21]. Group 3: Industry Implications - The collaboration between OpenAI and AMD is viewed as a critical development in the AI computing landscape, marking a shift in supply chain dynamics and competitive positioning within the industry [26]. - NVIDIA's stock experienced a decline following the announcement, indicating market reactions to the shifting alliances in the AI sector [24]. - OpenAI is also reportedly in discussions with Qualcomm to develop custom chips for future models, suggesting ongoing efforts to diversify its supply chain [26].
亚马逊“盲眼”机器人30秒跑酷首秀惊艳!华人学者领衔
量子位· 2025-10-06 05:42
henry 发自 凹非寺 量子位 | 公众号 QbitAI 你见过这样的"盲眼"机器人demo吗? 它在完全看不见的情况下——没有摄像头、雷达或任何感知单元——主动搬起9斤重的椅子,爬上1米高的桌子,然后翻跟头跳下。 不光耍酷,干起活来,搬箱子也不在话下。 还能一个猛子跳上桌子。 手脚并用爬坡也照样OK。 这些丝滑小连招来自 亚马逊机器人团队FAR (Frontier AI for Robotics)发布的 首个 人形机器人(足式)研究成果—— OmniRetarget ! OmniRetarget使强化学习策略能够在复杂环境中学习长时程的"移-操一体"(loco-manipulation)技能,并实现从仿真到人形机器人的零样本 迁移。 网友表示:又能跑酷、还能干活,这不比特斯拉的擎天柱强10倍? 此外,保留任务相关的交互使得数据能够进行高效的数据增强,进而从单个演示推广到不同的机器人本体、地形和物体配置,以减少不同变体 的数据收集成本。 在与其他动作重定向方法的对比中,OmniRetarget在所有关键方面:硬约束、物体交互、地形交互、数据增强表现出了全面的方法优势。 | Methods | Hard Ki ...
Sora2还在5秒打转,字节AI生视频已经4分钟“起飞”
量子位· 2025-10-06 05:42
Core Insights - ByteDance has developed a new method called Self-Forcing++ that enables the generation of long videos up to 4 minutes and 15 seconds without compromising quality, a significant improvement over existing models that typically generate videos of only 5 to 10 seconds [1][2][28] Group 1: Technology and Methodology - Self-Forcing++ utilizes a unique approach that does not require changing model architecture or collecting new long video datasets, allowing for the generation of high-quality long videos [1][2] - The method improves video generation by optimizing the training process through noise initialization, distribution matching distillation, and a rolling KV cache mechanism [13][14][15] - The model learns to generate stable long videos by iteratively correcting its mistakes, enhancing its ability to produce coherent and high-fidelity content over extended durations [15][17] Group 2: Performance Metrics - In short-duration scenarios (5 seconds), Self-Forcing++ achieved a semantic score of 80.37 and a total score of 83.11, outperforming several existing models [22][23] - For longer durations (50 seconds), it achieved a visual stability score of 90.94, significantly higher than competitors like CausVid and Self-Forcing [24] - The model demonstrated exceptional performance in generating videos of 75 to 100 seconds, maintaining high fidelity and consistency without common failure modes such as motion stagnation or quality degradation [26][28] Group 3: Future Implications - The advancements in long video generation suggest that the era of AI-generated films may be approaching, with potential applications in various media and entertainment sectors [6][28] - The introduction of Self-Forcing++ could lead to new standards in video quality and generation capabilities, impacting how content is created and consumed in the digital landscape [6][28]
重生之在《我的世界》做山姆·奥特曼:网友在线手搓ChatGPT
量子位· 2025-10-06 05:42
Core Viewpoint - The article discusses the impressive achievement of creating a ChatGPT model within the game Minecraft, showcasing the potential of using redstone circuits to simulate complex computational tasks [1][2][4]. Group 1: Model Specifications - The constructed ChatGPT model has approximately 5 million parameters, specifically 5,087,280 [16]. - It utilizes a TinyChat dataset for training, with an embedding dimension of 240 and a vocabulary of 1,920 tokens [18]. - The model features 6 layers and 5 attention heads, with a context window size of 64 tokens, suitable for very short conversations [19]. Group 2: Construction Process - The process involves training a small GPT model on a personal computer, compressing weights to low precision, and exporting the model structure [25]. - The next steps include translating computational methods into pixel block language and defining reusable circuit modules [26][27]. - Finally, a "compiler" script is used to map the trained model to redstone modules, facilitating the construction of the entire setup [28][30]. Group 3: Redstone Circuit Functionality - Redstone circuits in Minecraft operate on binary logic, where signals can be either on (1) or off (0), allowing players to build complex logic gates and circuits [32][34]. - This capability enables the construction of basic computational systems, such as adders and counters, leading to the potential for creating CPUs and neural networks [34]. Group 4: Broader Implications - The article highlights that the development of computational systems in Minecraft is still in its infancy, with only about 1% of the potential explored [37]. - Other projects within Minecraft include building CNNs for digit recognition and creating various games and even an internet simulation [39][46]. - The narrative suggests that players in Minecraft may eventually surpass current AI capabilities, hinting at a future where Minecraft could play a role in advancing artificial general intelligence (AGI) [48][49].
刚刚,全球AI生图新王诞生!腾讯混元图像3.0登顶了
量子位· 2025-10-05 05:43
Core Viewpoint - The article highlights that Tencent's Hunyuan Image 3.0 has claimed the top position in the global text-to-image model rankings, surpassing competitors like Google's Nano Banana and ByteDance's Seedream [1][2][7]. Group 1: Model Performance and Ranking - Hunyuan Image 3.0 achieved a score of 1167, leading the rankings among 26 models, with a total of 3,608 votes [1][3]. - The model outperformed Google's Nano Banana, ByteDance's Seedream, and OpenAI's GPT-Image, showcasing its competitive edge in the text-to-image domain [1][7]. Group 2: Model Architecture and Features - Hunyuan Image 3.0 is based on a native multimodal architecture, capable of processing text, images, videos, and audio inputs without relying on multiple models [12]. - The model has a parameter scale of 80 billion, making it the largest open-source text-to-image model currently available [13]. - It employs a generalized causal attention mechanism to effectively handle heterogeneous data modalities, integrating both autoregressive text generation and global attention for image generation [41][42]. Group 3: Training and Data Processing - The model was trained using a comprehensive three-stage filtering process, selecting nearly 5 billion high-quality images from over 10 billion raw images [53]. - The training strategy involved four progressive stages, enhancing the model's capabilities in multimodal understanding and generation [56][59]. Group 4: Evaluation and Comparison - Hunyuan Image 3.0 was evaluated using both automated metrics (SSAE) and human assessments (GSB), demonstrating superior performance compared to leading closed-source models [61][65]. - In human evaluations, Hunyuan Image 3.0 outperformed Seedream 4.0 by 1.17% and Nano Banana by 2.64%, indicating its competitive standing in the industry [65]. Group 5: Market Impact and User Engagement - The launch of Hunyuan Image 3.0 has generated significant interest and engagement among users, particularly during the festive season, reflecting its strong market presence [67]. - The model's capabilities extend to generating detailed visual content, such as retro ticket collages and complex fantasy scenes, showcasing its versatility and creativity [70][76].
推理token减少46%!Meta新方法缩短思维链,告别重复推导
量子位· 2025-10-05 05:43
时令 发自 凹非寺 量子位 | 公众号 QbitAI 大模型老走重复步骤,导致思维链越来越长怎么办? Meta、Mila-Quebec AI Institute、蒙特利尔大学和普林斯顿大学联合提出 元认知复用(Metacognitive Reuse) 机制 。 简单来说,就是让模型自己回顾、总结解题思路,将常用的推理套路提炼成更为简洁的"行为",并将其存储于 "行为手册(Behavior Handbook)" 中。 当再遇到类似问题时,模型便可直接从手册中调用相应的行为,无需重新推导。 实验结果显示,该机制通过行为条件推理、行为引导自我改进、行为条件监督微调三种应用场景,在MATH、AIME等数学基准测试中实现了 显著优化,在保持准确率不变的前提下, 最多可减少46%的推理token使用量 。 下面具体来看。 将重复出现的片段化繁为简 如今,大型语言模型在解决数学、编程等复杂任务时,广泛采用思维链进行推理,所以每次遇到新问题时,都需要重复推导通用子步骤。 这不仅会导致token用量膨胀、推理延迟增加,还会占用上下文窗口空间,降低模型探索新路径的能力。 与此同时,现有LLM的记忆系统(如RAG)仅存储 "是什么 ...
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-05 05:43
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 2025 人工智能年度潜力创业公司 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 聚焦于中国人 ...
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-04 04:13
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人 ...