量子位
Search documents
挖掘注意力中的运动线索:无需训练,解锁4D场景重建能力
量子位· 2025-12-17 09:07
VGGT4D团队 投稿 量子位 | 公众号 QbitAI 如何让针对静态场景训练的3D基础模型 (3D Foundation Models) ,在不增加训练成本的前提下,具备处理动态4D场景的能力? 来自 香港科技大学(广州)与地平线(Horizon Robotics) 的研究团队提出了 VGGT4D 。该工作通过深入分析Visual Geometry Transformer (VGGT) 的内部机制,发现并利用了隐藏在注意力层中的运动线索。 VGGT4D的核心设想:能否在不进行额外训练的前提下,直接从预训练的3D基础模型中挖掘出4D感知能力? 作为一种 无需训练 (Training-free) 的框架,VGGT4D在动态物体分割、相机位姿估计及长序列4D重建等任务上均取得了优异性能。 从3D迈向4D的挑战 近年来,以VGGT、DUSt3R为代表的3D基础模型在静态场景重建中表现出色。然而,面对包含移动物体 (如行人、车辆) 的 动态4D场景 时,这些模型的性能往往显著下降。动态物体的运动不仅干扰背景几何建模,还会导致严重的相机位姿漂移。 现有的解决方案通常面临两类挑战: 计算或训练成本高: 依赖繁重的测试时 ...
量子位编辑作者招聘
量子位· 2025-12-17 09:07
AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 编辑部 发自 凹非寺 量子位 | 公众号 QbitAI 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内 ...
英伟达护城河又宽了!低调收购开源算力调度王牌工具,全球过半顶级超算在用,Thinking Machines也离不开它
量子位· 2025-12-17 03:38
Core Viewpoint - NVIDIA's acquisition of SchedMD is seen as a strategic move to expand its competitive edge in the HPC and AI sectors by integrating SchedMD's Slurm system into its ecosystem, thereby enhancing its influence beyond hardware to resource scheduling [1][11][15]. Group 1: SchedMD Overview - SchedMD, founded in 2010, specializes in large-scale computing task scheduling technology [5]. - The core asset of SchedMD is the open-source workload management system Slurm, which efficiently allocates computing resources across numerous devices for tasks such as AI model training and scientific research [6][8]. - Slurm is utilized by over half of the TOP500 supercomputers globally, as well as by major tech companies like Meta and various AI startups [3][9][10]. Group 2: Strategic Importance of the Acquisition - The acquisition allows for low integration costs due to a decade-long collaboration between NVIDIA and SchedMD, facilitating a smooth transition of technology and team integration [12][13]. - The strategic value of the acquisition lies in extending NVIDIA's influence from hardware to scheduling, ensuring that even clients using AMD or Intel chips will rely on NVIDIA's ecosystem through Slurm [14][15]. - This move further solidifies NVIDIA's position among key customer groups, including supercomputing centers, cloud providers, and AI enterprises [16]. Group 3: Future Considerations - NVIDIA has committed to maintaining Slurm's open-source and vendor-neutral attributes, ensuring continued access for global users [18]. - However, there are concerns regarding NVIDIA's ongoing investment in critical projects like Slinky, which supports Slurm-on-Kubernetes services, raising questions about the future stability of related businesses [19][21].
Google全链路赋能出海:3人团队调度千个智能体,可成独角兽|MEET2026
量子位· 2025-12-17 03:38
Core Insights - The future will be characterized by autonomous collaboration among intelligent agents, solving complex problems, automating workflows, and autonomously issuing tasks, creating a new business model [1] - AI agents are becoming new productivity units, injecting new meaning into the globalization logic of startups [2] - The intelligent agent sector is just beginning, with significant changes expected in the next one to two years, presenting a major opportunity for Chinese startups to go global [3] Google’s Integrated Solutions for Startups - Google has launched AI-driven integrated solutions to empower startups for efficient globalization [4] - The MEET2026 conference attracted nearly 1,500 offline attendees and over 3.5 million online viewers, highlighting the significant interest in the topic [6] - Startups face various challenges during globalization, and Google’s ecosystem can support them at every stage [7] Stages of Startup Globalization - The five stages of startup globalization include: 1. **Ideation and Strategic Planning**: Founders gather information and analyze competitors, often using Gemini for market research [8] 2. **Product Launch**: Google Cloud provides stable cloud infrastructure support [9] 3. **Market Validation**: Google Ads assists in reaching target customers [9] 4. **Market Expansion**: Google Play and other services support expansion into new markets [9] 5. **IPO Maturity**: Google’s data analysis tools aid in the final push before going public [10] Challenges and Innovations in AI - The AI field is evolving rapidly, with challenges such as hallucination (inaccurate or fabricated information) being addressed through better model training and engineering practices [11] - The introduction of the A2A (Agent-to-Agent) protocol aims to facilitate communication between intelligent agents across different enterprises [16] - The shift from SaaS subscription models to outcome-based payment models reflects a fundamental change in business logic, allowing small teams to scale significantly [18] Gemini's Evolution and Capabilities - Gemini has evolved from its initial version to Gemini 3, which has achieved significant advancements in reasoning, understanding, and problem-solving capabilities [15] - Key capabilities of Gemini 3 include: 1. **Extended Context Window**: Supports 1 million tokens, emphasizing the importance of context engineering [21] 2. **Native Multimodal Capability**: Understands text, video, images, and audio with improved clarity and accuracy [22] 3. **Function Calling Ability**: Enables intelligent agents to utilize external tools and services [23] - Gemini 3 is considered the safest model to date, having undergone comprehensive safety assessments [24]
是个公司都在用AI Agent,但大家真的用明白了吗??| MEET2026圆桌论坛
量子位· 2025-12-17 01:04
Core Insights - The article discusses the evolution of AI Agents, emphasizing that a significant milestone will be reached when two out of three most frequently used apps by individuals are AI Agents [1][72] - Key metrics for evaluating a good AI Agent include controllability, explainability, and the ability to execute tasks consistently and stably [1] - Many AI Agents currently face negative gross margin issues, where the cost of completing tasks exceeds users' willingness to pay, posing a challenge for entrepreneurs [2][49] Group 1: Industry Perspectives - The year 2025 is anticipated to be the "Year of the Agent," marking the initial deployment of AI Agents in standardized scenarios such as customer service and claims processing, validating their technical feasibility and value [1][4] - The industry faces the challenge of aligning technology, product, and business models to create a sustainable positive feedback loop for AI Agents [2][4] - The roundtable discussion featured insights from industry leaders, highlighting the need for a rational and pragmatic approach to the widespread application of AI Agents across various sectors [3][10] Group 2: Product Development and Use Cases - AI Agents are evolving from simple tasks to more complex functions, such as creating presentations and coding, demonstrating significant advancements in their capabilities [23][25] - Successful implementations of AI Agents have shown ROI improvements, particularly with the advent of multimodal models that enhance understanding of images and videos [20][21] - The development of coding agents has progressed from writing code to executing entire workflows, resulting in efficiency gains of 3 to 5 times in software engineering tasks [25][35] Group 3: Key Challenges and Future Directions - A major challenge for AI Agents is the discrepancy between operational costs and user payment willingness, which hinders scalability for many startups [49] - The future evolution of AI Agents will likely focus on enhancing reliability and integrating them into physical environments, requiring advancements in both foundational models and engineering capabilities [56][57] - The industry anticipates a significant increase in AI Agent penetration in 2026, driven by major investments from leading tech companies and the emergence of user-friendly applications [58][61]
反超Nano Banana!OpenAI旗舰图像生成模型上线
量子位· 2025-12-17 01:04
Core Viewpoint - OpenAI has launched its new image generation model, GPT-Image-1.5, which aims to enhance practical usability and compete directly with other leading models in the market [2][13][14]. Summary by Sections Model Features - The new model introduces four main highlights: improved instruction adherence, precise editing, better detail retention, and a speed increase of up to four times compared to its predecessor [3][5][14]. - GPT-Image-1.5 is designed to maintain consistency in key elements such as lighting, composition, and character appearance during input, output, and multi-round editing [15][19]. Performance and Comparisons - In benchmark tests, GPT-Image-1.5 has been rated first in both text-to-image and image editing categories, surpassing the Nano Banana Pro [33]. - The model's instruction adherence rate is reported to be as high as 90%, indicating a significant lead over competitors [35]. Pricing and Accessibility - The API for GPT-Image-1.5 has seen a 20% reduction in input and output costs compared to the previous version [39]. - Pricing varies by resolution, with high-quality images costing approximately $133 per thousand and low-quality images around $9 per thousand [40]. Market Positioning - OpenAI is positioning GPT-Image-1.5 as a productivity tool with its focus on fine editing capabilities and reduced pricing, indicating a strategic shift towards enhancing practical applications [41]. - The model is now available to all ChatGPT users and API users globally, marking a significant step in OpenAI's product offerings [38].
给Agent装上“海马体”!上海AILab开源MemVerse,定义多模态记忆新范式
量子位· 2025-12-16 11:52
Core Insights - The article emphasizes the need for a multi-modal memory system for AI agents, moving beyond traditional text-based memory to a more complex, experience-based memory framework [1][2][4] Group 1: Multi-Modal Memory Framework - MemVerse is introduced as the first general multi-modal memory framework for AI agents, integrating images, audio, and video with text into a unified semantic space [1][4] - The framework features a "dual-path" architecture and "memory distillation" technology, enabling AI agents to possess lifelong memory capabilities that are responsive and adaptable [1][4][10] Group 2: Performance and Efficiency - MemVerse has demonstrated significant performance improvements in benchmark tests, such as a nearly 9 percentage point increase in the ScienceQA score for GPT-4o-mini, from 76.82 to 85.48 [8] - In video retrieval tasks, MemVerse outperformed traditional methods like CLIP (29.7% recall rate) and specialized models such as ExCae (67.7%) and VAST (63.9%) [8] - The system can reduce token consumption by up to 90% while maintaining high accuracy, significantly lowering operational costs and delays for long-term memory [8][9] Group 3: Memory Architecture - MemVerse's architecture mimics human cognitive processes, consisting of a central coordinator, short-term memory (STM), and long-term memory (LTM) [6][11] - The central coordinator actively manages memory interactions, enhancing the agent's ability to make intelligent decisions rather than relying on passive data retrieval [11] - The LTM is structured into core memory (user profiles), situational memory (event timelines), and semantic memory (abstract concepts), facilitating deep associative reasoning and addressing "hallucination" issues [11] Group 4: Open Source and Community Engagement - The project has been open-sourced by the Shanghai Artificial Intelligence Laboratory, inviting developers to experiment with the framework [12]
用企业级智能体落地,还有谁没踩这四种大坑?无问芯穹的系统性解法来了
量子位· 2025-12-16 11:52
Core Viewpoint - The article discusses the challenges and opportunities in the implementation of AI agents in enterprises, emphasizing the need for a robust infrastructure to support their effective deployment and operation [4][52][63]. Group 1: Current State of AI Agents - AI agents have been integrated into many workflows but are often perceived as having only intern-level capabilities [2][3]. - Many teams use AI agents for automation but do not fully trust them with core responsibilities [3][4]. - The focus in the industry is shifting from merely achieving model performance to addressing engineering and application scenarios for enterprise-level deployment [4][52]. Group 2: Challenges in AI Agent Implementation - Enterprises face four common pitfalls when deploying AI agents: effectiveness issues, stability during scaling, rising costs, and difficulties in establishing a commercial loop [8][21]. - Effectiveness issues arise from various factors such as model selection and prompt design, leading to performance degradation over time [11][12][13]. - Stability problems become apparent when AI agents transition from small-scale trials to real business environments, resulting in task delays and errors [14][15]. - Despite expectations, AI agents have not significantly reduced costs, with high token usage leading to expenses of 20-50 yuan for large model calls [16][17][18]. - Establishing a commercial loop requires AI agents to integrate into product flows and payment systems, which many current solutions lack [19][20]. Group 3: Solutions Offered by Wenshu Qiong - Wenshu Qiong's AI agent service platform aims to address the systemic gaps in AI agent deployment [25][26]. - The platform provides a comprehensive solution that includes templates for various AI capabilities, allowing enterprises to avoid trial-and-error during initial implementation [28]. - It offers stability and scalability through robust technical support and system resilience, significantly improving operational efficiency [32][33]. - Cost management is enhanced through deep integration of model optimization and hardware collaboration, allowing enterprises to control expenses effectively [36][37][39]. - The platform facilitates commercial viability by connecting AI agents with external tools and payment systems, streamlining the integration process [41][42]. Group 4: Future Trends and Organizational Changes - The article predicts that as AI agents become more prevalent, enterprises will need to adapt their organizational structures to accommodate multiple agents working collaboratively [55][56]. - The competitive edge will increasingly depend on the number and quality of AI agents and their collaborative systems within organizations [60][61]. - The infrastructure for AI agents will be crucial for differentiating enterprises in the market, akin to the foundational systems that support vehicles [61][62]. - Wenshu Qiong positions itself as a provider of this essential infrastructure, focusing on creating a solid foundation for enterprise-level AI agent deployment [63][67].
量子位编辑作者招聘
量子位· 2025-12-16 11:52
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit" to track AI advancements and become content experts in various AI-related fields [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are full-time and based in Beijing, with opportunities for editorial roles at various levels, including editor, lead writer, and chief editor [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Focus on innovations in infrastructure, including chips, AI infrastructure, and cloud computing [6]. - **AI Finance Direction**: Track venture capital and financial reports in the AI sector, monitoring capital movements within the industry [6]. - **AI Product Direction**: Monitor advancements in AI applications and hardware, including software products and terminal technologies [6]. Group 3: Benefits and Growth - Employees will have access to the latest AI technologies and tools, enhancing work efficiency and creativity [6]. - The company offers a vibrant team environment, professional mentorship, and competitive compensation packages, including various benefits [6][12]. - The company aims to build personal influence through original content creation and networking opportunities with industry leaders [6]. Group 4: Company Overview - As of 2025, Quantum Bit has over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - It is recognized as the top new media outlet in the AI and frontier technology sector according to third-party data platforms [12].
QQ音乐你变了,竟能免费在AI PC上原创一首《大东北》
量子位· 2025-12-16 11:52
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 你的 QQ音乐 还是只能用来听歌吗? 请注意,它现在已经有了另一种"打开方式"—— AI作歌 。 而且还是 免费 的那种! 瞧,我们只需要先点击QQ音乐左上角的AI作歌按钮: 然后我们只需要把关于歌曲的灵感直接输入进去,选择"流行"的曲风,最后点击 "AI快速创作" 就好了。 AI会先生成完整的包括引子、主歌、副歌的歌词: 短短几分钟,一首 原创 的 《大东北》 就这么水灵灵地诞生了: 而且啊,QQ音乐的AI作曲功能,是 只有在AI PC上才能免费实现的 。 或许有小伙伴要问了,现在去别的AI作曲的网站或软件不都可以吗? 非也非也,在 AI PC 上作曲的 " 玩法 " 是完全不一样的,因为它运行的是本地大模型,是 在本地做的推理 。 并且它已经解决了普通人想表达却不会作曲的痛点,只要你有想法,一句话、免费,就能立即把想法谱写成属于你自己的独特旋律,每个人 都可以成为创作者。 而且不只是做音乐,现在在AI PC上面搞创作,打开各式各类的应用,它们的"AI含量"简直不要太高。 即便是专业人士,也是可以用AI PC在几分钟内创作一首样曲,把寻找灵感和创作的门槛 ...