量子位
Search documents
亚马逊“盲眼”机器人30秒跑酷首秀惊艳!华人学者领衔
量子位· 2025-10-06 05:42
henry 发自 凹非寺 量子位 | 公众号 QbitAI 你见过这样的"盲眼"机器人demo吗? 它在完全看不见的情况下——没有摄像头、雷达或任何感知单元——主动搬起9斤重的椅子,爬上1米高的桌子,然后翻跟头跳下。 不光耍酷,干起活来,搬箱子也不在话下。 还能一个猛子跳上桌子。 手脚并用爬坡也照样OK。 这些丝滑小连招来自 亚马逊机器人团队FAR (Frontier AI for Robotics)发布的 首个 人形机器人(足式)研究成果—— OmniRetarget ! OmniRetarget使强化学习策略能够在复杂环境中学习长时程的"移-操一体"(loco-manipulation)技能,并实现从仿真到人形机器人的零样本 迁移。 网友表示:又能跑酷、还能干活,这不比特斯拉的擎天柱强10倍? 此外,保留任务相关的交互使得数据能够进行高效的数据增强,进而从单个演示推广到不同的机器人本体、地形和物体配置,以减少不同变体 的数据收集成本。 在与其他动作重定向方法的对比中,OmniRetarget在所有关键方面:硬约束、物体交互、地形交互、数据增强表现出了全面的方法优势。 | Methods | Hard Ki ...
Sora2还在5秒打转,字节AI生视频已经4分钟“起飞”
量子位· 2025-10-06 05:42
Core Insights - ByteDance has developed a new method called Self-Forcing++ that enables the generation of long videos up to 4 minutes and 15 seconds without compromising quality, a significant improvement over existing models that typically generate videos of only 5 to 10 seconds [1][2][28] Group 1: Technology and Methodology - Self-Forcing++ utilizes a unique approach that does not require changing model architecture or collecting new long video datasets, allowing for the generation of high-quality long videos [1][2] - The method improves video generation by optimizing the training process through noise initialization, distribution matching distillation, and a rolling KV cache mechanism [13][14][15] - The model learns to generate stable long videos by iteratively correcting its mistakes, enhancing its ability to produce coherent and high-fidelity content over extended durations [15][17] Group 2: Performance Metrics - In short-duration scenarios (5 seconds), Self-Forcing++ achieved a semantic score of 80.37 and a total score of 83.11, outperforming several existing models [22][23] - For longer durations (50 seconds), it achieved a visual stability score of 90.94, significantly higher than competitors like CausVid and Self-Forcing [24] - The model demonstrated exceptional performance in generating videos of 75 to 100 seconds, maintaining high fidelity and consistency without common failure modes such as motion stagnation or quality degradation [26][28] Group 3: Future Implications - The advancements in long video generation suggest that the era of AI-generated films may be approaching, with potential applications in various media and entertainment sectors [6][28] - The introduction of Self-Forcing++ could lead to new standards in video quality and generation capabilities, impacting how content is created and consumed in the digital landscape [6][28]
重生之在《我的世界》做山姆·奥特曼:网友在线手搓ChatGPT
量子位· 2025-10-06 05:42
Core Viewpoint - The article discusses the impressive achievement of creating a ChatGPT model within the game Minecraft, showcasing the potential of using redstone circuits to simulate complex computational tasks [1][2][4]. Group 1: Model Specifications - The constructed ChatGPT model has approximately 5 million parameters, specifically 5,087,280 [16]. - It utilizes a TinyChat dataset for training, with an embedding dimension of 240 and a vocabulary of 1,920 tokens [18]. - The model features 6 layers and 5 attention heads, with a context window size of 64 tokens, suitable for very short conversations [19]. Group 2: Construction Process - The process involves training a small GPT model on a personal computer, compressing weights to low precision, and exporting the model structure [25]. - The next steps include translating computational methods into pixel block language and defining reusable circuit modules [26][27]. - Finally, a "compiler" script is used to map the trained model to redstone modules, facilitating the construction of the entire setup [28][30]. Group 3: Redstone Circuit Functionality - Redstone circuits in Minecraft operate on binary logic, where signals can be either on (1) or off (0), allowing players to build complex logic gates and circuits [32][34]. - This capability enables the construction of basic computational systems, such as adders and counters, leading to the potential for creating CPUs and neural networks [34]. Group 4: Broader Implications - The article highlights that the development of computational systems in Minecraft is still in its infancy, with only about 1% of the potential explored [37]. - Other projects within Minecraft include building CNNs for digit recognition and creating various games and even an internet simulation [39][46]. - The narrative suggests that players in Minecraft may eventually surpass current AI capabilities, hinting at a future where Minecraft could play a role in advancing artificial general intelligence (AGI) [48][49].
刚刚,全球AI生图新王诞生!腾讯混元图像3.0登顶了
量子位· 2025-10-05 05:43
Core Viewpoint - The article highlights that Tencent's Hunyuan Image 3.0 has claimed the top position in the global text-to-image model rankings, surpassing competitors like Google's Nano Banana and ByteDance's Seedream [1][2][7]. Group 1: Model Performance and Ranking - Hunyuan Image 3.0 achieved a score of 1167, leading the rankings among 26 models, with a total of 3,608 votes [1][3]. - The model outperformed Google's Nano Banana, ByteDance's Seedream, and OpenAI's GPT-Image, showcasing its competitive edge in the text-to-image domain [1][7]. Group 2: Model Architecture and Features - Hunyuan Image 3.0 is based on a native multimodal architecture, capable of processing text, images, videos, and audio inputs without relying on multiple models [12]. - The model has a parameter scale of 80 billion, making it the largest open-source text-to-image model currently available [13]. - It employs a generalized causal attention mechanism to effectively handle heterogeneous data modalities, integrating both autoregressive text generation and global attention for image generation [41][42]. Group 3: Training and Data Processing - The model was trained using a comprehensive three-stage filtering process, selecting nearly 5 billion high-quality images from over 10 billion raw images [53]. - The training strategy involved four progressive stages, enhancing the model's capabilities in multimodal understanding and generation [56][59]. Group 4: Evaluation and Comparison - Hunyuan Image 3.0 was evaluated using both automated metrics (SSAE) and human assessments (GSB), demonstrating superior performance compared to leading closed-source models [61][65]. - In human evaluations, Hunyuan Image 3.0 outperformed Seedream 4.0 by 1.17% and Nano Banana by 2.64%, indicating its competitive standing in the industry [65]. Group 5: Market Impact and User Engagement - The launch of Hunyuan Image 3.0 has generated significant interest and engagement among users, particularly during the festive season, reflecting its strong market presence [67]. - The model's capabilities extend to generating detailed visual content, such as retro ticket collages and complex fantasy scenes, showcasing its versatility and creativity [70][76].
推理token减少46%!Meta新方法缩短思维链,告别重复推导
量子位· 2025-10-05 05:43
时令 发自 凹非寺 量子位 | 公众号 QbitAI 大模型老走重复步骤,导致思维链越来越长怎么办? Meta、Mila-Quebec AI Institute、蒙特利尔大学和普林斯顿大学联合提出 元认知复用(Metacognitive Reuse) 机制 。 简单来说,就是让模型自己回顾、总结解题思路,将常用的推理套路提炼成更为简洁的"行为",并将其存储于 "行为手册(Behavior Handbook)" 中。 当再遇到类似问题时,模型便可直接从手册中调用相应的行为,无需重新推导。 实验结果显示,该机制通过行为条件推理、行为引导自我改进、行为条件监督微调三种应用场景,在MATH、AIME等数学基准测试中实现了 显著优化,在保持准确率不变的前提下, 最多可减少46%的推理token使用量 。 下面具体来看。 将重复出现的片段化繁为简 如今,大型语言模型在解决数学、编程等复杂任务时,广泛采用思维链进行推理,所以每次遇到新问题时,都需要重复推导通用子步骤。 这不仅会导致token用量膨胀、推理延迟增加,还会占用上下文窗口空间,降低模型探索新路径的能力。 与此同时,现有LLM的记忆系统(如RAG)仅存储 "是什么 ...
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-05 05:43
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 2025 人工智能年度潜力创业公司 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 聚焦于中国人 ...
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-04 04:13
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人 ...
AI花17小时写了篇30页学术论文!自主选题,包含实验,还符合APA格式规范
量子位· 2025-10-04 04:13
Core Insights - The article discusses an AI system named Virtuous Machines that autonomously conducted research, producing a 30-page academic paper in 17 hours at a cost of $114 [1][3][24] - The research focused on cognitive psychology, specifically human visual cognition [5][24] Research Process - The AI system generated research questions based on cognitive psychology theories, such as the relationship between visual working memory and mental rotation abilities [9] - It designed an experimental plan, calculated sample size, controlled variables, and measured participants' mental imagery clarity using the VVIQ2 scale [11] - The AI recruited 288 participants through the Prolific online platform, ultimately collecting 277 valid responses [11] - Data analysis involved writing Python code for repeated measures ANOVA, identifying outliers, and adjusting statistical models [12] AI System Architecture - The AI's research capabilities stem from a collaborative structure that simulates human cognitive mechanisms and allows dynamic knowledge interaction [14][21] - The core control module, referred to as Master, oversees the entire process, while other AI assistants focus on specific tasks like literature retrieval and data analysis [15][16] - The system's foundational abilities include knowledge retrieval, abstract reasoning, metacognitive reflection, task decomposition, and autonomous iteration [20] Efficiency and Limitations - The AI's efficiency is highlighted, being over ten times faster than human teams, with rigorous data analysis that avoids statistical pitfalls [24] - However, it occasionally misinterprets theories, mislabels chart axes, and confuses terms, indicating limitations in theoretical depth and innovative thinking compared to human researchers [25][26]
陶哲轩用GPT-5解决数学难题:仅29行Python代码
量子位· 2025-10-04 04:13
Core Insights - The article highlights how AI, specifically GPT-5, has significantly aided mathematician Terence Tao in solving complex mathematical problems, reducing the time and effort required for manual calculations and coding [1][2][3]. Group 1: AI's Role in Mathematics - Terence Tao expressed that without AI assistance, completing similar tasks would take several hours, primarily due to manual coding and debugging [1]. - Tao utilized GPT-5 to tackle a problem on MathOverflow regarding the relationship between the least common multiple sequence and highly abundant numbers, which required extensive numerical searches [7][10]. - The AI's ability to assist in this mathematical inquiry marks a new era of collaboration between humans and machines in exploring complex problems [5][29]. Group 2: Problem-Solving Process - Initially, Tao attempted to have GPT-5 generate a Python program to search for counterexample parameters but faced issues with long execution times and improper initial parameters [19][20]. - He then shifted to a step-by-step dialogue with GPT-5, breaking down the larger problem into smaller, manageable parts, which ultimately led to the successful generation of the required parameters [21][22]. - The final solution involved a concise 29-line Python script generated by GPT-5, which Tao used for independent verification, confirming the results aligned with his heuristic predictions [23][24]. Group 3: Broader Implications of AI in Research - This instance is not the first time Tao has employed AI for mathematical problem-solving; he has previously used AI for various projects, demonstrating its potential as a mediator in mathematical proofs [27][28]. - The article suggests that while AI may not achieve accolades like the Fields Medal in the short term, it can significantly enhance the efficiency and effectiveness of mathematical research [28][29].
OpenAI强硬回击马斯克窃密诉讼!xAI被指恶意人肉离职员工
量子位· 2025-10-04 04:13
Core Viewpoint - OpenAI has responded strongly to the lawsuit filed by xAI, denying all allegations of corporate espionage and asserting that the lawsuit is an attempt to intimidate its employees [2][3][10]. Group 1: Allegations by xAI - xAI has made three main allegations against OpenAI: violation of federal trade secret laws, intentional interference with xAI's economic relationships with its employees, and violation of California's unfair competition laws [11]. - Specific incidents cited include the alleged theft of proprietary information by former xAI engineers Xuechen Li and Jimmy Fraiture, who are accused of transferring sensitive data to OpenAI [12][14][15]. - xAI also claims that a former senior finance executive left without signing a confidentiality agreement and took critical strategic information to OpenAI [19][20]. Group 2: OpenAI's Defense - OpenAI has categorically denied the allegations, stating that Xuechen Li never officially joined the company and did not transfer any proprietary information [27][29]. - Regarding Jimmy Fraiture, OpenAI asserts that any actions taken during his "garden leave" were personal and not directed by OpenAI, and that no confidential information was received [31][32]. - OpenAI emphasizes that the unnamed finance executive's departure was unrelated to any alleged poaching and was due to refusing to engage in improper financial practices at xAI [33][34]. Group 3: Legal Proceedings - OpenAI has filed a motion to dismiss xAI's lawsuit, arguing that the claims lack merit and that the inclusion of names of former employees not accused of wrongdoing is an act of intimidation [37]. - A hearing for this motion is scheduled for November 18, 2025, which will address procedural matters rather than the substantive issues of the case [38].