Workflow
量子位
icon
Search documents
突破户外RGB-only SLAM尺度漂移难题,精确定位+高保真重建 | ICCV'25开源
量子位· 2025-07-18 06:16
S3PO-GS团队 投稿 量子位 | 公众号 QbitAI 户外SLAM的尺度漂移问题,终于有了新解法! 香港科技大学(广州) 的研究的最新成果: S3PO-GS ,一个专门针对户外单目SLAM的3D高斯框架,已被ICCV 2025接收。 项工作的亮点在于首次实现了RGB单目SLAM的全局尺度一致性。在Waymo、KITTI和DL3DV三大户外基准测试中,S3PO-GS不仅在新视角 合成任务中刷新了SOTA纪录,更是在DL3DV场景中将跟踪误差降低了77.3%。 这篇文章做了什么? 在自动驾驶、机器人导航及AR/VR等前沿领域,SLAM技术的鲁棒性直接影响系统性能。 当前基于3D高斯(3DGS)的SLAM方案虽在室内场景表现卓越,但在仅依赖RGB输入的无界户外环境中仍面临严峻挑战: 单目系统固有的深度先验缺失导致几何信息不足,而引入单目深度估计或端到端点云模型(如MASt3R)作为几何先验时,又因帧间尺度不一 致性引发系统级尺度漂移,该问题在复杂户外场景尤为突出。 针对这一双重瓶颈,香港科技大学(广州)研究团队提出创新框架 S3PO-GS ,首次实现RGB单目SLAM的全局尺度一致性。 该方案通过三大核心技术 ...
一年破千万美金,一款海外AI创意引擎爆发了
量子位· 2025-07-18 06:16
Core Viewpoint - Creati, an AI-driven creative engine, has rapidly gained traction in the advertising sector, amassing 10 million users and generating millions in annual revenue within just one year of its launch [5][6]. Group 1: AI Creative Engine - Creati focuses on automating the creative process in advertising, differentiating itself from competitors by leveraging influencer power for customized creative content [6][8]. - The platform allows businesses to transform popular influencer videos into tailored templates, significantly reducing the time and effort required to generate marketing materials [9][12]. - Creati's unique AI model enables the production of high-quality videos that rival traditional advertising efforts, attracting major brands like Shein and Cider [10][11]. Group 2: Market Disruption - The platform addresses the pain points of both influencers and small businesses by providing a stable income stream for influencers and simplifying the creative process for businesses [11][12]. - Creati's approach to content generation is designed specifically for e-commerce, recognizing the unique needs of online retailers compared to general video generation tools [18][20]. - The platform's ability to maintain consistency in product representation is a key advantage, particularly for e-commerce businesses [20]. Group 3: Data-Driven Innovation - Creati employs a data feedback loop to refine its AI creative model, allowing for continuous improvement based on user engagement metrics [21][22]. - The platform's ability to generate customized content based on brand characteristics and audience feedback enhances its effectiveness in driving marketing success [21][22]. - Creati's vision includes developing a creative agent that autonomously generates and optimizes advertising content, potentially revolutionizing the marketing landscape [24][25]. Group 4: Future Aspirations - The company aims to evolve into a comprehensive creative engine that can assist users in various aspects of content creation, beyond just advertising [29]. - Creati's long-term goal is to integrate advanced technologies, such as brain-computer interfaces, to further enhance its creative capabilities [29][30].
真热AI!米哈游5亿成立新公司
量子位· 2025-07-18 00:30
就在最近,米哈游全资成立了新公司:上海米哈游无定谷科技有限公司, 注册资本高达5亿 。 该公司经营范围不仅涵盖软件开发、动漫游戏开发,还延伸至人工智能应用软件等领域。 时令 发自 凹非寺 量子位 | 公众号 QbitAI 更早一些时候,蔡浩宇创业的AI游戏《Whispers from the Star》开放Steam试玩demo。 西方马斯克在造AI女友,东边 米哈游 却在造"无定谷"。 此外,官方也整大活,蹭马斯克热度,让Grok Ani体验游戏,AI女孩对话AI女孩。 5亿注册资本,米哈游从未这么大手笔 梳理来看,米哈游在AI领域布局已久。 成立AI相关公司的动作早已有之: 直到昨天,米哈游5亿成立无定谷科技公司,创下其在AI领域投资之最,足见其布局人工智能的雄心。 点开米哈游官网,首先映入眼帘的,是这样一句愿景: 这句话出自于米哈游CEO蔡浩宇。他曾经在分享中表示,米哈游的目标是"在未来10到30年内,能够做出像《黑客帝国》、《头号玩家》等电 影中所描绘的虚拟世界"。 为此,米哈游早在2018年开始涉足AI领域,成立了"逆熵研究部",拥有自研AI大模型Glossa。 "逆熵"团队的代表作为数字人鹿鸣,曾 ...
ChatGPT智能体正式发布,多个创业赛道昨夜无眠
量子位· 2025-07-18 00:30
Core Viewpoint - OpenAI has launched ChatGPT Agent, a unified intelligent agent that combines thinking and execution capabilities, transforming the way users interact with technology and manage tasks [2][5][8]. Group 1: Features and Capabilities - ChatGPT Agent can take over entire computer operations, functioning almost like a new operating system [3]. - It can perform various tasks in work scenarios, such as scheduling meetings, generating presentations, and submitting expense reports, akin to a high-level executive assistant [4]. - In personal scenarios, it can plan travel itineraries and manage significant events, similar to a personal secretary for CEOs [4]. - The agent integrates multiple capabilities, including website interaction, high-quality information synthesis, and conversational abilities, into a single system [10][12]. - Users can set fixed times for task execution, such as generating weekly reports [19]. Group 2: User Access and Model Training - Pro, Plus, and Team version users can experience the enhanced capabilities, with Pro users able to execute nearly unlimited tasks monthly [22][23]. - The model is not entirely new but is a specialized version of OpenAI's existing models, trained to dynamically learn and optimize its task execution [26][27]. - ChatGPT Agent has achieved state-of-the-art (SOTA) performance in various benchmarks, including a score of 41.6 in a challenging test known as "the last exam" [30][31]. Group 3: Industry Impact and Future Trends - The introduction of ChatGPT Agent signifies a major transformation in the AI landscape, potentially reshaping how tasks are performed across various sectors [41]. - The concept of AI agents is evolving, with applications extending beyond simple tasks to more complex interactions, resembling human-like capabilities [47][50]. - The rise of AI agents is expected to redefine the internet landscape, moving from website-centric models to agent-centric applications [52][55].
o1核心贡献者离职后首发声:AI是史上最强杠杆,超越人力、资本和代码
量子位· 2025-07-17 09:03
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 又一位离职OpenAI的核心研究员发声! 刚刚被曝加入Meta的 Hyung Won Chung ,分享了他对AI未来的深刻思考:人工智能正在成为有史以来最强大的杠杆机制。 Hyung Won Chung和一同离开OpenAI 的Jason Wei 是长期搭档,他们的合作可以追溯到谷歌大脑时期,两人曾共同作为第一作者发表了 关于模型微调的重要论文《Scaling Instruction-Finetuned Language Models》。 Jason Wei曾称赞他: Hyung Won Chung识别新范式并完全舍弃任何沉没成本的能力给我留下了深刻的印象。2022年底,他意识到了强化学习的力量,并从 那时起就一直在宣扬它。 Hyung Won Chung在OpenAI期间,是o1、o1-preview和Deep Research等核心项目的贡献者。 AI成为个人能力的倍增器 Hyung Won Chung首先从一朵含苞待放的花讲起。 他指出,人类天生不擅长察觉那些以年为单位发生的缓慢变化,这种"缺陷"让我们可能 严重低估了AI带来的变革幅度 。 人工智 ...
Transformer危!谷歌MoR架构发布:内存减半推理速度还翻倍
量子位· 2025-07-17 09:03
Core Viewpoint - Google has introduced a new underlying architecture called Mixture-of-Recursions (MoR), which significantly enhances reasoning speed by 2 times while halving KV memory usage, and allows for dynamic resource allocation across different tasks within a single framework [1][2][3]. Group 1: MoR Innovations - MoR integrates unified parameter sharing and adaptive recursion depth, addressing the high computational and memory demands of traditional Transformers while maintaining model performance [7][9]. - The architecture employs a recursive Transformer that divides the model into recursive blocks, reusing a shared pool of parameters, which reduces the number of unique parameters and enhances distributed training efficiency [10][13]. - MoR utilizes a dynamic routing mechanism to assign different recursion depths to each token, concentrating computation on complex tokens, and incorporates KV caching strategies to improve memory efficiency [15][19]. Group 2: Performance Comparison - Experiments comparing MoR with original Transformers and recursive baseline models across various parameter scales (135M to 1.7B) show that MoR uses nearly 50% fewer parameters while achieving lower validation loss and higher few-shot accuracy of 43.1% [16][19]. - MoR reduces training FLOPs by 25% and training time by 19% while also decreasing peak memory usage by 25% when training on a fixed 20B tokens [21]. - The routing strategy analysis indicates that Expert-choice routing outperforms Token-choice routing, highlighting the importance of routing granularity on performance [22]. Group 3: Architectural Evolution - Google has a history of rethinking underlying architectures, aiming to reconstruct computational paradigms through innovations like the Mixture of Experts (MoE) model, which allows for efficient training of large models by activating only a subset of expert networks [27][30]. - The introduction of MoR is seen as a potential game-changer in the AI landscape, with expectations that it may surpass the capabilities of Transformers in the future [32].
人类击败OpenAI守住编程冠军!10小时激战两次反超,AI最后关头功亏一篑
量子位· 2025-07-17 07:04
白交 发自 凹非寺 量子位 | 公众号 QbitAI 10小时激战!人类最后关头实现超越,获得编程总决赛冠军~ OpenAI 在大部分比赛中都排名第一,本以为就这样了。人类开始反超,结果还剩1小时20分钟的时候,OpenAI又重新领先。不过还是没有 坚持到最后。 | | Standings Exhibition with OpenAI | | | --- | --- | --- | | Rank | User | Score | | 1 | OpenAIAHC | 43542614363 | | 2 | Psyho | 42420277629 | | 3 | terry_u16 | 34248482621 | | 4 | nikaj | 33740582721 | | 5 | saharan | 31754963614 | OpenAI总裁Greg Brockman发来贺电,中间还夹带私货:OpenAI位居第二。 此时获得冠军的人类表示 要累死了 。 因为过去三天我估计只睡了10个小时,现在都快撑不住了。 而原本始终保持领先优势的OpenAI,最终屈居第二。 在刚刚落幕的AtCoder世界巡回总决赛上,12名 ...
Claude Code出逃的主创又回来了!Anthropic:过去俩月我收入暴涨5.5倍,别走
量子位· 2025-07-17 07:04
Core Viewpoint - The article discusses the rapid return of key personnel Boris Cherny and Cat Wu to Anthropic from Cursor, highlighting the competitive landscape in Silicon Valley and the implications for Anthropic's valuation and growth potential in the AI sector [1][6][7]. Group 1: Personnel Movements - Boris Cherny and Cat Wu, key figures at Claude Code, were initially recruited by Anysphere, the company behind Cursor, where they were set to develop "agent-like" functionalities [2][4][5]. - Just two weeks after their departure, both were lured back to Anthropic, indicating the company's strong position in retaining talent amidst fierce competition [6][7]. Group 2: Valuation and Financial Performance - Anthropic is reportedly in discussions for a new funding round with a target valuation of $100 billion, which would mark a significant increase from its previous valuation of $58 billion just four months prior [8][9][10]. - The company aims to improve its profitability metrics, with current gross margins from direct sales of AI models around 60%, moving towards a target of 70% [12][19]. Group 3: Revenue Growth and Market Strategy - Anthropic's revenue has seen a fourfold increase in the first half of the year, with annualized revenue exceeding $4 billion [20]. - The company is pursuing a "model-as-a-service + vertical solutions" strategy, offering tailored AI solutions across various industries, including finance, law, and healthcare [15][19]. Group 4: Product Development and User Engagement - The launch of Claude Code has significantly boosted user engagement, with a 300% increase in active users and a 5.5-fold revenue growth since the release of the Claude 4 series [21][26]. - Anthropic has introduced a comprehensive analytics dashboard for Claude Code, allowing enterprises to track their AI spending and usage metrics effectively [24][25]. Group 5: Investment and Future Prospects - Amazon is reportedly considering a new multi-billion dollar investment in Anthropic, potentially making it the largest shareholder, following a previous investment of $4 billion [28][31]. - This investment reflects a broader trend where companies are recognizing the long-term profitability potential of AI technologies beyond initial hype [32].
苹果向英伟达生态妥协了!MLX框架主动适配CUDA
量子位· 2025-07-17 05:52
一水 发自 凹非寺 量子位 | 公众号 QbitAI 苹果向英伟达生态妥协了! 最新消息,苹果之前特意为端侧AI模型训练推出的 MLX框架 , 主动增加了CUDA支持 。 消息一出即在Hacker News引发热烈讨论: 要知道苹果一直以来都以"封闭"著称,但随着英伟达CUDA生态在AI开发领域占据绝对主导地位,苹果这下也不得不转变姿态了。 再加上英伟达市值创下前无古人的4万亿美元新纪录,以及最近释出的一系列利好消息,苹果选择避其锋芒也就不难理解。 可以说,苹果这就是明晃晃地借了英伟达东风,以进一步抢夺AI市场。 CUDA太强,不得不拥抱 为啥要拥抱CUDA?没啥,太强了,苹果自己也这么说。 官方理由如下: (1) 统一内存支持 :CUDA提供统一内存机制,便于不同设备间的数据共享与迁移,提升开发效率和性能表现。 (2) 跨平台部署需求 :英伟达硬件在学术研究和大规模计算中应用广泛,支持CUDA能让开发者在Mac上本地开发测试,随后无缝部署到 配备英伟达GPU的服务器或超级计算机上。 而通过让MLX框架主动适配CUDA, 今后苹果开发者也能利用英伟达GPU训练模型 。 其本质是增加了对CUDA的后端支持,方便 ...
云计算一哥,刚刚重新定义了AI Agent的玩法
量子位· 2025-07-17 05:52
Core Viewpoint - Amazon Web Services (AWS) has redefined the deployment of AI Agents in production environments with the launch of Amazon Bedrock AgentCore, a comprehensive toolkit for building enterprise-level AI Agents [3][19]. Group 1: Amazon Bedrock AgentCore - AgentCore simplifies the development of AI applications by providing a unified management system for various components, making the process more efficient [5][16]. - It includes seven core services that address the complexities of deploying AI Agents, likened to a fully furnished apartment ready for occupancy [6]. - The services offered by AgentCore include Runtime, Memory, Observability, Identity, Gateway, Browser, and Code Interpreter, each designed to enhance the functionality and security of AI Agents [8][9][10][11][12][13][14]. Group 2: AI Agents and Tools Marketplace - AWS has introduced a new category in its Marketplace for AI Agents and tools, allowing customers to easily find solutions by describing their use cases in natural language [24]. - This initiative aims to facilitate the rapid deployment and testing of AI Agents in various business scenarios [69]. Group 3: Amazon Nova and Kiro - AWS has launched Amazon Nova, which allows customization of model training lifecycles, enhancing the flexibility of AI applications [26][29]. - Kiro, a new AI programming tool, enables users to transform ideas into functional software through a structured process, streamlining the development workflow [49][51][65]. Group 4: S3 Vectors - Amazon S3 Vectors is introduced as a cloud storage service optimized for large-scale vector datasets, reducing storage costs by up to 90% [38][40]. - It supports efficient querying and integration with other AWS services, enhancing the capabilities of AI Agents in data management [47]. Group 5: Market Trends and Future Outlook - A significant shift towards AI Agents is observed, with over 50% of companies deploying them in production environments, and Gartner predicts that by 2028, 33% of enterprise software will incorporate Agentic AI [71][72]. - The emphasis on AI Agents reflects a broader trend in technology, with expectations that they will transform work and life in ways comparable to the advent of the internet [73].