量子位
Search documents
Sora2甚至可以预测ChatGPT的输出
量子位· 2025-10-02 05:30
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 让它模拟"给ChatGPT发信息",它不仅生成了画面,还来了一段有问有答的"交互"。 先是编了一个问题:Write a playful haiku about a cat staring out the window.(写一首关于猫凝视窗外的俏皮俳句。) Sora2太卷了。 居然能预测ChatGPT的输出、渲染HTML?! 然后又以ChatGPT回答的模式给出了音频回应:Whiskers pressed to glass. Birds gossip beyond the pain. Tail flicks. Daydreams fly. (中文大意是:"胡须紧贴玻璃。鸟儿在窗外叽喳。尾巴轻摇。白日梦飞扬。) 全程以ChatGPT的机械女声回答,并且俳句音节还卡得严丝合缝。 这段 视频场景+LLM推理 的实测效果让一众网友惊叹,甚至有人说"Sora2模糊了视频生成和交互式AI的边界"。 而这段代码在真实浏览器中渲染的样子be like: 实际上不仅是像这样能预测ChatGPT的推理回答,Sora2还能渲染HTML。 通过了玻璃折射测试 还有人让Sora2渲染 ...
Murati翁荔陈丹琦公司发布首个产品,让大模型微调门槛暴降,要重新发明一个OpenAI
量子位· 2025-10-02 03:26
Core Insights - Thinking Machines Lab has launched its first product, Tinker, which simplifies model fine-tuning to the level of modifying Python code [1][12] - The company has moved past the "zero product, zero revenue" valuation of $84 billion [2] Product Overview - Tinker is a flexible API designed for fine-tuning language models, allowing researchers to control algorithms and data without managing infrastructure [12][13] - The initial support for Tinker includes Qwen3 and Llama3 series models, enabling easy switching between small and large models with a simple string modification in Python code [15] - Tinker’s API automates low-level training steps while handling scheduling, scaling, and error recovery [17] Technical Features - Tinker utilizes LoRA to allow multiple training tasks to share the same GPU, reducing costs and enabling more parallel experiments [22] - The gradient update strategy for Tinker is defined as: New parameters = Original parameters + Learning rate × Advantage value × Gradient of log probability [28] Industry Reception - Tinker has garnered significant attention in the industry, with beta testers noting its excellent balance between abstraction and tunability compared to other fine-tuning tools [30] - Research teams from prestigious institutions have already achieved notable results using Tinker [30] Strategic Vision - Thinking Machines Lab aims to reinvent a version of OpenAI that emphasizes open research sharing and greater freedom for researchers [10][11] - The company’s mission aligns with making cutting-edge models more accessible for customization based on individual needs [14]
英伟达一口气开源多项机器人技术,与迪士尼合作研发物理引擎也开源了
量子位· 2025-10-02 03:26
Core Viewpoint - NVIDIA has made significant advancements in robotics by releasing multiple open-source technologies, including the Newton physics engine, which enhances robots' physical intuition and reasoning capabilities, addressing key challenges in robot development [1][4][10]. Group 1: Newton Physics Engine - The Newton physics engine aims to solve the challenge of transferring skills learned in simulation to real-world applications, particularly for humanoid robots with complex joint structures [4]. - It is an open-source project managed by the Linux Foundation, built on NVIDIA's Warp and OpenUSD frameworks, utilizing GPU acceleration to simulate intricate robot movements [4]. - Leading institutions such as ETH Zurich and Peking University have already begun using the Newton engine, indicating its adoption by top-tier robotics companies and universities [4][3]. Group 2: Isaac GR00T N1.6 Model - The Isaac GR00T N1.6 model integrates the Cosmos Reason visual language model, enabling robots to understand and execute vague commands, a longstanding challenge in the industry [5][6]. - This model allows robots to convert ambiguous instructions into actionable plans while performing simultaneous movements and object manipulations [6]. - The Cosmos Reason model has surpassed 1 million downloads, and the accompanying open-source physical AI dataset has exceeded 4.8 million downloads, showcasing its popularity and utility [6]. Group 3: Training Innovations - The Isaac Lab 2.3 developer preview introduces a new workflow for teaching robots to grasp objects, utilizing an "automated curriculum" that gradually increases task difficulty [8]. - This approach has been successfully implemented by Boston Dynamics' Atlas robot, enhancing its manipulation capabilities [8]. - NVIDIA has collaborated with partners to develop the Isaac Lab Arena, a framework for large-scale experiments and standardized testing, streamlining the evaluation process for developers [8]. Group 4: Hardware Infrastructure - NVIDIA has invested in hardware advancements, including the GB200 NVL72 system, which integrates 36 Grace CPUs and 72 Blackwell GPUs, already adopted by major cloud service providers [9]. - The Jetson Thor, equipped with Blackwell GPUs, supports multiple AI workflows for real-time intelligent interactions, with several partners already utilizing this technology [9]. - Nearly half of the papers presented at CoRL referenced NVIDIA's technologies, highlighting the company's influence in the robotics research community [9]. Group 5: Comprehensive Strategy - NVIDIA's "full-stack" approach, encompassing open-source physics engines, foundational models, training workflows, and hardware infrastructure, is redefining the landscape of robotics development [10]. - The advancements suggest that the integration of robotics into everyday life may occur sooner than anticipated [11].
机器人“狂踹不倒”视频刷屏!太空舱遍布城市街巷,银河通用这几手秀麻了
量子位· 2025-10-02 02:12
Core Viewpoint - The article discusses the advancements in robotics technology, particularly focusing on the Any2Track framework developed by Galaxy General Robotics, which enhances the ability of robots to accurately mimic human movements while maintaining stability in real-world environments [2][7][29]. Group 1: Any2Track Framework - Any2Track is a two-stage reinforcement learning framework that balances precise motion imitation and disturbance resistance, overcoming the challenges of achieving both generality and adaptability in robotic movements [7][8][12]. - The framework consists of two main components: AnyTracker, which focuses on general motion tracking, and AnyAdapter, which enables dynamic adaptation to environmental changes [10][17][28]. - Experimental results show that Any2Track significantly outperforms traditional methods in motion tracking accuracy and adaptability under various disturbances [30][32][36]. Group 2: Practical Applications - Galaxy General Robotics has developed various end-to-end embodied models, such as GraspVLA and TrackVLA, which demonstrate significant breakthroughs in core tasks like precise manipulation and navigation [38][50]. - The "Galaxy Space Capsule" serves as a platform to deploy these robotic technologies in real-world scenarios, enhancing service capabilities in urban environments [40][50]. - The company aims to integrate intelligent robotics into everyday life, with applications ranging from retail to tourism, showcasing the potential of humanoid robots as a new technological hallmark for China [59][60]. Group 3: Technological Innovations - The company employs a data paradigm that prioritizes synthetic data generation complemented by real data, addressing the scarcity of real-world data in the field of embodied intelligence [54][58]. - This approach facilitates rapid and cost-effective production of high-quality data, accelerating the training and deployment of robots across diverse scenarios [55][56]. - The overarching goal is to enable humanoid robots to perform complex tasks in various industries, thereby enhancing productivity and service quality [58][59].
字节Seed发布PXDesign:蛋白设计效率提升十倍,进入实用新阶段
量子位· 2025-10-01 03:03
字节Seed团队 投稿 量子位 | 公众号 QbitAI AI蛋白设计进入新阶段! 最近,字节跳动Seed团队多模态生物分子结构大模型 (Protenix) 项目组提出了一种可扩展的蛋白设计方法,叫做 PXDesign 。 要知道,蛋白设计一直是个成功率很低的任务,即便是DeepMind推出的AlphaProteo,凭借其AlphaFold系列模型,在相同靶点上的成功率 也仅为 9%-33%。 此外,Protenix团队还推出了 公开免费的binder在线设计服务 ,让科学家无需自建复杂流程,就能直接调用这一能力,加速科研探索。 背景与意义 蛋白质是生命活动的基石。2024年诺贝尔化学奖一半授予 David Baker (计算蛋白设计) ,另一半联合授予 Demis Hassabis 与 John Jumper (蛋白结构预测) 。 在实际测试中,PXDesign展现出极高的效率, 24小时内即可生成数百个高质量的候选蛋白 ,生成效率较业界主流方法提升约10倍,并在多 个靶点上实现了 20%–73%的湿实验成功率 ,达到了当前领域的领先水平。 生成:快速高效 这也凸显了科学家关注的挑战: 不仅要"预测结构", ...
全新合成框架SOTA:强化学习当引擎,任务合成当燃料,蚂蚁港大联合出品
量子位· 2025-10-01 03:03
AntResearchNLP 团队 投稿 量子位 | 公众号 QbitAI 下一步,大模型应该押注什么方向? PromptCoT 2.0:PromptCoT框架的一次全面升级 在一年前的这个时候,在整个AI社区都在思考大模型应该押注什么方向的时候,OpenAI公布了o1的预览版,通过深度思考的新范式以及在竞 赛数学代码任务上远远甩开gpt4o的性能,让整个大模型社区进入了"深度思考"时代。 如今,又是一年9月,蚂蚁与港大联合在大模型下半场押注 任务合成 。 为什么是任务合成? 蚂蚁通用人工智能中心自然语言组联合香港大学自然语言组(后简称"团队")推出 PromptCoT 2.0 ,要在大模型下半场押注 任务合成 。 实验表明,通过"强起点、强反馈"的自博弈式训练,PromptCoT 2.0可以让30B-A3B模型在一系列数学代码推理任务上实现新的 SOTA 结 果,达到和DeepSeek-R1-0528, OpenAI o3, Gemini 2.5 Pro等相当的表现。 事实上,按照OpenAI规划的AGI蓝图,大模型社区正在从Reasoners向Agents急速推进,各种关于Agent的工作,包括搜索、软件 ...
谁是2025年度最好的编程语言?
量子位· 2025-10-01 01:12
Core Viewpoint - Python continues to dominate as the most popular programming language, achieving a remarkable lead over its competitors, particularly Java, in the IEEE Spectrum 2025 programming language rankings [2][4][5]. Group 1: Python's Dominance - Python has secured its position as the top programming language for ten consecutive years, marking a significant achievement in the IEEE Spectrum rankings [6]. - This year, Python has not only topped the overall ranking but also led in growth rate and employment orientation, making it the first language to achieve this triple crown in the 12-year history of the IEEE rankings [7]. - The gap between Python and Java is substantial, indicating Python's strong growth trajectory [4][5]. Group 2: Python's Ecosystem and AI Influence - Python's rise can be attributed to its simplicity and the development of powerful libraries such as NumPy, SciPy, matplotlib, and pandas, which have made it a favorite in scientific, financial, and data analysis fields [10]. - The network effect has played a crucial role, with an increasing number of developers choosing Python and contributing to its ecosystem, creating a robust community around it [11]. - AI has further amplified Python's advantages, as it possesses richer training data compared to other languages, making it the preferred choice for AI applications [12][13]. Group 3: Other Languages' Challenges - JavaScript has experienced the most significant decline, dropping from the top three to sixth place in the rankings, indicating a shift in its relevance [15]. - SQL, traditionally a highly valued skill, has also faced challenges from Python, which has encroached on its territory, although SQL remains a critical skill for database access [18][21][23]. Group 4: Changes in Programming Culture - The community culture among programmers is declining, with a noticeable drop in activity on platforms like Stack Overflow, as many now prefer to consult AI for problem-solving [25][26]. - The way programmers work is evolving, with AI taking over many tedious tasks, allowing developers to focus less on programming details [30][31]. - The diversity of programming languages may decrease as AI supports only mainstream languages, leading to a stronger emphasis on a few dominant languages [37][39]. Group 5: Future of Programming - The programming landscape is undergoing a significant transformation, potentially leading to a future where traditional programming languages may become less relevant [41]. - While high-level languages like Python have simplified programming, the ultimate goal may shift towards direct interaction with compilers through natural language prompts [46]. - The role of programmers may evolve, focusing more on architecture design and algorithm selection rather than maintaining extensive source code [49][50].
OpenAI突然发布Sora 2:好一个“AI版抖音”!
量子位· 2025-10-01 01:12
Core Viewpoint - OpenAI has launched Sora 2, an AI-generated video platform that functions similarly to TikTok, allowing users to create and share AI-generated content with enhanced realism and control [1][33]. Group 1: Sora 2 Features - Sora 2 is an upgraded model that generates videos with improved adherence to physical laws, resulting in more realistic movements and interactions [7][11]. - The platform allows for complex scene generation while maintaining logical consistency within the virtual environment [11]. - Users can inject real-world elements into the generated videos, enabling the integration of specific individuals into various AI-created scenarios [14][15]. Group 2: User Interaction and Control - The Sora app provides users with tools for content creation, customization of information feeds, and the ability to engage in secondary creation of AI content [15][37]. - Users have complete control over their likeness in the "cameo" feature, allowing them to authorize or revoke the use of their image in generated videos [24][38]. - The app aims to enhance user experience by utilizing a new recommendation algorithm based on OpenAI's existing language models [37]. Group 3: Market Position and Comparison - Sora 2 is positioned as a competitor to existing AI video applications, such as Kuaishou's Keling, with users comparing the performance of both platforms under similar prompts [42]. - The initial rollout of the Sora iOS app is focused on the North American market, indicating a strategic entry point for OpenAI [33].
首次实现第一视角视频与人体动作同步生成!新框架攻克视角-动作对齐两大技术壁垒
量子位· 2025-10-01 01:12
生成的视频可以通过从人体动作推导出的相机位姿,借助 3D 高斯点渲染(3D Gaussian Splatting)提升到三维场景中。 闻乐 发自 凹非寺 量子位 | 公众号 QbitAI AI生成第三视角视频已经驾轻就熟,但第一视角生成却仍然"不熟"。 为此,新加坡国立大学、南洋理工大学、香港科技大学与上海人工智能实验室联合发布 EgoTwin , 首次实现了第一视角视频与人体动作的 联合生成 。 一举攻克了 视角-动作对齐 与 因果耦合 两大瓶颈,为可穿戴计算、AR及具身智能打开落地新入口。 EgoTwin 是一个基于扩散模型的框架,能够以视角一致且因果连贯的方式联合生成第一人称视角视频和人体动作。 下面具体来看。 第一视角视频与人体动作同步生成 核心挑战:第一视角生成的"两难困境" 第一视角视频的本质是 人体动作驱动的视觉记录 ——头部运动决定相机的位置与朝向,全身动作则影响身体姿态与周围场景变化。 二者之间存在内在的耦合关系,无法被单独分离。传统视频生成方法难以适配这一特性,主要面临两大难题: 1. 视角对齐难题 生成视频中的相机轨迹,必须与人体动作推导的头部轨迹精准匹配。但现有方法多依赖预设相机参数生 ...
可能是目前效果最好的开源生图模型,混元生图3.0来了
量子位· 2025-09-30 12:22
Core Viewpoint - Tencent has released and open-sourced HunyuanImage 3.0, the largest open-source native multimodal image generation model with 80 billion parameters, which integrates understanding and generation capabilities, rivaling leading closed-source models in the industry [1][20]. Model Features - HunyuanImage 3.0 supports multi-resolution image generation and exhibits strong instruction adherence, world knowledge reasoning, and text rendering capabilities, producing aesthetically pleasing and artistic outputs [1][11]. - The model inherits world knowledge reasoning from Hunyuan-A13B, allowing it to solve complex tasks such as generating detailed steps for solving equations [4][5]. - It can handle intricate prompts, such as visualizing sorting algorithms with specific styles and providing pseudocode, showcasing its advanced text rendering abilities [7][11]. Technical Architecture - The model is based on Hunyuan-A13B, utilizing a native multimodal and unified autoregressive framework that deeply integrates text understanding, visual understanding, and high-fidelity image generation [17][19]. - Unlike traditional approaches, HunyuanImage 3.0 employs a dual-encoder structure and incorporates generalized causal attention to enhance both language reasoning and global image modeling [22][25]. - The training process includes a three-stage filtering of over 10 billion images to select nearly 5 billion high-quality, diverse images, ensuring the removal of low-quality data [32]. Training Strategy - The training begins with a progressive four-stage pre-training process, gradually increasing image resolution and complexity, culminating in a fine-tuning phase focused on specific text-to-image generation tasks [36][38]. - The model employs a multi-stage post-training strategy that includes human preference data to refine the generated outputs [38]. Evaluation Metrics - HunyuanImage 3.0's performance is assessed using both automated metrics (SSAE) and human evaluations (GSB), demonstrating competitive results against leading models in the industry [40][46]. - The model achieved a 14.10% higher win rate compared to its predecessor, HunyuanImage 2.1, indicating significant improvements in performance [46].