Workflow
量子位
icon
Search documents
第一家被收购的AI浏览器公司,43亿成交,产品还在内测
量子位· 2025-09-05 06:33
Core Viewpoint - The acquisition of The Browser Company by Atlassian for $610 million marks a significant event in the AI browser market, with a focus on the newly developed AI browser, Dia, which aims to enhance productivity for white-collar workers [1][3][12]. Group 1: Acquisition Details - The Browser Company, known for its AI browsers Arc and Dia, was acquired by Atlassian for $610 million (approximately 4.3 billion RMB) [1]. - The acquisition was premeditated, with discussions between the CEOs occurring a year prior, initially focusing on the Arc browser rather than Dia [8][10]. - The Browser Company has raised a total of $128 million since its inception in 2019, with a valuation of $550 million last year [17][19]. Group 2: Market Context and Reactions - The acquisition has sparked skepticism among netizens regarding Atlassian's judgment, given that Dia has been in beta testing since its launch in June [5][6]. - Despite the skepticism, the acquisition is seen as a strategic move to secure necessary resources and distribution channels in a competitive AI browser market [22][23]. Group 3: Product Focus and Vision - Atlassian's founder, Cannon-Brookes, envisions Dia as a browser designed for operational efficiency rather than mere information browsing, aiming to integrate various tools and enhance user productivity [25][26]. - Dia is optimized for commonly used SaaS applications, providing contextual information to assist in daily tasks [27]. - The browser connects AI capabilities with personal work memory, facilitating better integration of applications and tasks [29].
全给黄仁勋玩明白了!15亿美元租自家GPU/教小弟用GPU换融资,英伟达又一世子被曝准备IPO
量子位· 2025-09-05 06:33
Core Viewpoint - Nvidia is significantly investing in cloud computing by renting its own AI chips from Lambda, indicating a strategic move to strengthen its dominance in the cloud market [1][2][25]. Group 1: Nvidia's Investment and Rental Agreements - Nvidia will lease 10,000 GPU servers equipped with its own AI chips from Lambda for four years, totaling $1.3 billion [2]. - Additionally, Nvidia has entered into another rental agreement for 8,000 servers, valued at $200 million [3]. - The purpose of these rentals is to meet Nvidia's internal research and development needs [4]. Group 2: Lambda's Role and IPO Preparation - Lambda is preparing for an IPO, potentially as early as the first half of 2026 [7][23]. - As a cloud provider, Lambda offers competitive pricing for GPU rentals, especially for long-term or large-scale usage scenarios [11]. - Nvidia is not only a supplier but also an investor and customer of Lambda, creating a symbiotic relationship [10]. Group 3: Financial Strategies and Market Positioning - Nvidia participated in Lambda's $480 million Series D funding round in February 2025, positioning itself as a strategic investor [14]. - Lambda also secured $500 million in debt financing to purchase Nvidia GPUs, with the GPUs serving as collateral [14]. - Nvidia's strategy includes deepening ties with smaller cloud providers to ensure its chips' market penetration, countering competition from larger cloud firms developing their own chips [30][28]. Group 4: Competitive Landscape and Future Outlook - Nvidia's data center business is a major growth driver, contributing $41.1 billion in revenue for the second quarter of fiscal 2026, a 56% year-over-year increase [25]. - The company aims to maintain its leading position in the computing market by supporting smaller cloud firms like Lambda and CoreWeave [31][32]. - CoreWeave, another Nvidia-backed cloud provider, recently went public and saw its stock price surge, reflecting Nvidia's successful investment strategy [22][21].
字节Seed最新版原生智能体来了!一个模型搞定手机/电脑/浏览器自主操作
量子位· 2025-09-05 04:28
Core Viewpoint - The article discusses the advancements of ByteDance's UI-TARS-2, a new generation of AI agents that can autonomously operate graphical user interfaces (GUIs) across various platforms, outperforming competitors like Claude and OpenAI [2][23][24]. Group 1: UI-TARS-2 Overview - UI-TARS-2 is designed to autonomously complete complex tasks on computers, mobile devices, web browsers, terminals, and even games [6][10]. - The architecture includes a unified agent framework, multimodal perception, multi-round reinforcement learning, and hybrid operation flows [7][8]. Group 2: Challenges Addressed - UI-TARS-2 tackles four major challenges in AI GUI operation: data scarcity, environment fragmentation, single capability, and training instability [5][10]. - The model employs a "data flywheel" strategy to address data scarcity by collecting raw data and generating high-quality task-specific data through iterative training [11][12]. Group 3: Reinforcement Learning Enhancements - The team optimized traditional reinforcement learning methods to ensure stable operations in long-duration GUI tasks by improving task design, reward mechanisms, and training processes [15][17]. - The model uses asynchronous rollout and several enhancements to the PPO algorithm to improve stability and encourage exploration of less common but potentially effective actions [17][18]. Group 4: Performance Metrics - UI-TARS-2 has shown superior performance in various GUI tests, scoring higher than Claude and OpenAI models in tasks across different operating systems and command-line environments [23][24]. - In gaming scenarios, UI-TARS-2 achieved an average score of approximately 60% of human performance, outperforming competitors in several games [27][28]. Group 5: Practical Applications - Beyond GUI operations, UI-TARS-2 can perform tasks such as information retrieval and code debugging, demonstrating its versatility and effectiveness compared to models relying solely on GUI interactions [28][29].
ChatGPT新功能,又干掉一批创业项目
量子位· 2025-09-05 04:28
Core Viewpoint - ChatGPT has introduced a new feature called "Conversation Branching," allowing users to engage in multiple conversation threads without cluttering the original dialogue [1][3][4]. Group 1: Conversation Branching Feature - The "Conversation Branching" feature enables users to click a button to create a new topic based on the existing conversation [4][8]. - This feature is designed to enhance user experience by allowing for separate discussions while maintaining context from the original topic [12][13]. - The implementation of this feature is seen as a response to user feedback, indicating a demand for more organized conversation management [3][13]. Group 2: Project Functionality - ChatGPT has made the previously paid "Project" feature available for free, enhancing accessibility for all users [16]. - The "Project" feature includes file upload limitations based on user subscription levels, with free users allowed to upload up to 5 files, Plus users up to 25 files, and Pro users up to 40 files [19]. - Users can customize project colors and icons, improving project differentiation and recognition efficiency [18].
OpenAI宣布推出AI在线招聘平台,和微软的领英打起来了
量子位· 2025-09-05 01:49
Core Viewpoint - OpenAI is launching an AI-driven online recruitment platform, OpenAI Jobs Platform, aimed at matching corporate needs with employee skills, directly competing with LinkedIn [2][11][12] Group 1: OpenAI Jobs Platform - The OpenAI Jobs Platform will provide a dedicated channel for small businesses and local governments to access top AI talent [5] - The platform aims to connect skilled individuals with companies needing AI expertise, enhancing local business competitiveness and government services [16][17] - OpenAI is collaborating with various organizations, including Walmart and local government offices, to build this platform [14][15] Group 2: AI Skills Development - OpenAI has launched OpenAI Academy, a free online learning platform that has already helped over 2 million people acquire AI skills [18] - The company plans to expand the Academy with certification courses for different AI proficiency levels, aiming to provide AI skills certification to 10 million Americans by 2030 [20][21] - Research indicates that employees with AI skills are more valuable and efficient, leading to higher salaries compared to those without such skills [18][22] Group 3: Competitive Landscape - OpenAI's new recruitment platform poses a direct challenge to LinkedIn, which is owned by Microsoft, OpenAI's largest financial backer [11][12] - The competition raises questions about potential conflicts of interest, as OpenAI's success could impact LinkedIn's market position [12][13]
DeepSeek新大招曝光:下一步智能体
量子位· 2025-09-05 01:49
Core Viewpoint - DeepSeek is reportedly developing a new model with enhanced AI Agent capabilities, expected to launch by the end of this year [3][8]. Group 1: Model Development - DeepSeek's recent update in August introduced DeepSeek-V3.1, which features improved Agent capabilities through Post-Training optimization, enhancing performance in tool usage and agent tasks [5][11]. - The upcoming model is designed to execute complex operations with minimal prompts and can self-evolve based on historical actions [7][8]. - The transition from DeepSeek V3 to V3.1 over nine months indicates a focus on incremental improvements rather than major version changes [9][10]. Group 2: Performance Metrics - DeepSeek-V3.1 shows significant performance improvements in various benchmarks compared to its predecessors: - SWE-bench: 66.0 (V3.1) vs. 45.4 (V3) and 44.6 (R1) - SWE-bench Multilingual: 54.5 (V3.1) vs. 29.3 (V3) and 30.5 (R1) - Terminal-Bench: 31.3 (V3.1) vs. 13.3 (V3) and 5.7 (R1) [12]. - In search agent evaluations, V3.1 also demonstrated comprehensive performance enhancements over R1 [12]. Group 3: Future Outlook - The introduction of DeepSeek R1 has significantly influenced the global large model industry, marking a pivotal moment in its development [15]. - The concept of AI agents is gaining traction, with predictions that by mid-2025, nearly all large model products will incorporate agent functionalities [16][18]. - There is speculation about the potential reduction in price barriers for AI agents if DeepSeek leads this initiative [19].
英伟达老黄收购了一家AI编程公司
量子位· 2025-09-05 01:49
Core Viewpoint - Nvidia is actively expanding its ecosystem in AI programming through strategic acquisitions, including the recent purchase of the AI coding startup Solver, which focuses on developing AI agents for software programming [2][8][17]. Group 1: Acquisition Details - Nvidia has acquired Solver, an AI coding company founded in 2022, which aims to manage entire codebases rather than just code completion [8][12][22]. - The founders of Solver, Mark Gabel and Daniel Lord, have significant backgrounds in AI, with Gabel being a former chief scientist at Viv Labs and Lord being a co-founder of Siri [10][11]. - This acquisition aligns with Nvidia's strategy to build a software ecosystem around its leading AI hardware, potentially shortening enterprise development cycles on Nvidia's platform [17][23]. Group 2: Previous Acquisitions - Over the past two years, Nvidia has made several acquisitions to lower chip usage costs and enhance AI support, including: - Lepton AI, a company that rents out servers powered by Nvidia chips [18][19]. - Gretel, a synthetic data startup acquired in March 2025 to meet AI training data needs [20]. - Run:ai, an Israeli software provider focused on AI workload orchestration, acquired for $700 million in December 2024 [20]. - OctoAI, specializing in generative AI tools, acquired for approximately $250 million in September 2024 [20]. - Brev, a platform for building and deploying AI models, acquired in July 2024 to optimize access to Nvidia GPUs in the cloud [20]. Group 3: Implications of the Acquisition - The acquisition of Solver signifies a shift towards AI agents that will play a more integral role in software development, moving beyond mere code completion to actively participating in codebase construction, testing, and management [22][23]. - This move is part of Nvidia's ongoing "AI acquisition spree," expanding its business scope from chips and data tools to AI agents, thereby deepening its industry footprint [23][24].
突破具身智能“专家困境”!北大新方法让宇树G1靠单一框架掌握跳舞和侧手翻
量子位· 2025-09-05 01:49
BumbleBee团队 投稿 量子位 | 公众号 QbitAI 人形机器人对跳舞这件事,如今是越来越擅长了。 比如跳一支查尔斯顿舞,一分四十秒的丝滑摇摆,稳定得像踩着节拍器: 不过,它们能否像人类一样自如切换跳舞、体操、日常操作等不同的动作模式? 北京大学与BeingBeyond团队联合研发的 BumbleBee 系统给出了最新答案:通过创新的" 分治-精炼-融合 "三级架构,该系统首次实现人形 机器人在多样化动作中的稳定控制。 破解"专家困境"与"现实鸿沟" 传统人形机器人控制策略长期面临两大核心挑战: BumbleBee 系统通过"分治-精炼-融合"三级架构,首次在单一控制框架内实现从专家策略优化到通用全身控制的跨越,为通用具身智能控制 提供了全新解决方案。 运动-语义联合驱动的动作分类:构建动作理解的"双通道" 系统通过多模态特征构建与联合隐空间对齐,实现动作在运动学与语义层面的双重表征: 运动学特征提取:基于SMPL格式的人类运动序列,通过前向运动学转换为世界坐标系中的3D关节坐标(如头部、骨盆、手脚等关键 点),并补充脚部速度、根节点位移等动态物理量;最后通过Transformer编码。 专家困境 ...
告别海量标注!浙大团队提出GUI-RCPO,让GUI定位在无标签数据上自我进化
量子位· 2025-09-05 01:49
ZJU REAL Lab 团队 投稿 量子位 | 公众号 QbitAI 无需海量数据标注,智能体也能精确识别定位目标元素了! 来自浙大等机构的研究人员提出 GUI-RCPO ——一种自我监督的强化学习方法,可以让模型在没有标注的数据上自主提升GUI grounding (图形界面定位) 能力。 何谓GUI grounding?为什么要提升这项能力? 简单而言,近年来,以视觉-语言模型为骨架的GUI智能体正在迅猛发展,只需要一句语言指令,它们就能像人一样手眼协同地操作电脑、手 机、网页等界面。 GUI智能体的一个关键能力在于GUI grounding,也就是根据用户给出的自然语言指令,GUI智能体需要在用户界面中精确地识别并定位可操 作的目标元素。 良好的GUI grounding能力可以使得GUI智能体更好地理解图形界面,以及完成更加精准地界面交互。 然而,想要训练这样一种看似简单的能力,却需要大规模高质量的标注数据——当前绝大多数方法动辄需要上百万级的标注数据,而构建这样 的高质量的标注数据需要大量的人工和时间成本。 而GUI-RCPO正好解决了上述问题,其核心原理如下: 通过创新性地将Test-time ...
AI生成苹果Metal内核,PyTorch推理速度提升87%
量子位· 2025-09-04 08:37
henry 发自 凹非寺 量子位 | 公众号 QbitAI AI自动生成的苹果芯片Metal内核,比官方的还要好? Gimlet Labs的最新研究显示,在苹果设备上,AI不仅能 自动生成Metal内核 ,还较基线内核实现了 87% 的PyTorch推理速度提升。 更惊人的是,AI生成的Metal内核还在测试的215个PyTorch模块上实现了平均 1.87倍 的加速,其中一些工作负载甚至比基准快了 数百倍 。 真就AI Make苹果AI Great Again? 用AI为苹果设备生成内核 先说结论:通过AI自动实现内核优化,可以在无需修改用户代码、无需新框架或移植的情况下,显著提升模型性能。 至于为什么是苹果?别问——问就全球最大硬件供应商(doge) 接下来,让我们看看研究人员是怎么做的: 为了证明这一点,研究人员选取了来自Anthropic、DeepSeek和OpenAI的8个顶尖模型,让它们为苹果设备生成优化的GPU内核,以加速 PyTorch推理速度。 实验设置 首先,在模型选择方面,参与测试的模型包括:claude-sonnet-4、claude-opus-4;gpt-4o、gpt-4.1、gpt ...