Workflow
量子位
icon
Search documents
P图新手福音!智能修图Agent一句话精准调用200+专业工具,腾讯混元&厦大出品
量子位· 2025-12-26 04:24
JarvisEvo团队 投稿 量子位 | 公众号 QbitAI 下面就来了解一下详细情况吧~ 自我评估和修正 研究背景与动机 近年来,基于指令的图像编辑模型虽然取得了显著进展,但在追求"专业级"修图体验时,仍面临两大核心挑战: 1. 指令幻觉 (Instruction Hallucination): 现有的文本思维链 (Text-only CoT) 存在信息瓶颈。模型在推理过程中"看不见"中间的修图结果,仅凭文本"脑补"假设进行下一步操作的 视觉结果,容易导致事实性错误,无法确保每一步都符合用户意图。 一句话让照片变大片,比专业软件简单、比AI修图更可控! 腾讯混元携手厦门大学推出 JarvisEvo ——一个统一的图像编辑智能体模拟人类专家设计师,通过 迭代编辑、视觉感知、自我评估和自我反 思 来"p图"。 "像专家一样思考,像工匠一样打磨" 。JarvisEvo不仅能用Lightroom修图,更能"看见"修图后的变化,并自我评判好坏,从而实现无需外部 奖励的自我进化 。 2. 奖励黑客 (Reward Hacking): 在强化学习进行偏好对齐的过程中,策略模型(Policy)是动态更新的,而奖励模型(R ...
推理成本打到1元/每百万token,浪潮信息撬动Agent规模化的“最后一公里”
量子位· 2025-12-26 04:24
Core Viewpoint - The global AI industry has transitioned from a model performance competition to a "life-and-death race" for the large-scale implementation of intelligent agents, where cost reduction is no longer optional but a critical factor for profitability and industry breakthroughs [1] Group 1: Cost Reduction Breakthrough - Inspur Information has launched the Yuan Brain HC1000 ultra-scalable AI server, achieving a breakthrough in inference cost to 1 yuan per million tokens for the first time [2][3] - This breakthrough is expected to eliminate the cost barriers for the industrialization of intelligent agents and reshape the underlying logic of competition in the AI industry [3] Group 2: Future Cost Dynamics - Liu Jun, Chief AI Strategist at Inspur, emphasized that the current cost of 1 yuan per million tokens is only a temporary victory, as the future will see an exponential increase in token consumption and demand for complex tasks, making current cost levels insufficient for widespread AI deployment [4][5] - For AI to become a fundamental resource like water and electricity, token costs must achieve a significant reduction, evolving from a "core competitiveness" to a "ticket for survival" in the intelligent agent era [5] Group 3: Historical Context and Current Trends - The current AI era is at a critical point similar to the history of the internet, where significant reductions in communication costs have driven the emergence of new application ecosystems [7] - As technology advances and token prices decrease, companies can apply AI on more complex and energy-intensive tasks, leading to an exponential increase in token demand [8] Group 4: Token Consumption Data - Data from various sources indicates a significant increase in token consumption, with ByteDance's Doubao model reaching a daily token usage of over 50 trillion, a tenfold increase from the previous year [13] - Google's platforms are processing 1.3 trillion tokens monthly, equivalent to a daily average of 43.3 trillion, up from 9.7 trillion a year ago [13] Group 5: Cost Structure Challenges - Over 80% of current token costs stem from computing expenses, with the core issue being the mismatch between inference and training loads, leading to inefficient resource utilization [12] - The architecture must be fundamentally restructured to enhance the output efficiency of unit computing power, addressing issues such as low utilization rates during inference and the "storage wall" bottleneck [14][16] Group 6: Innovations in Architecture - The Yuan Brain HC1000 employs a new DirectCom architecture that allows for efficient aggregation of massive local AI chips, achieving a breakthrough in inference cost [23] - This architecture supports ultra-large-scale lossless expansion and enhances inference performance by 1.75 times, with single card utilization efficiency (MFU) potentially increasing by 5.7 times [27] Group 7: Future Directions - Liu Jun stated that achieving a sustainable and significant reduction in token costs requires a fundamental innovation in computing architecture, shifting the focus from scale to efficiency [29] - The AI industry must innovate product technologies, develop dedicated computing architectures for AI, and explore specialized computing chips to optimize both software and hardware [29]
特斯拉通过「物理图灵测试」!英伟达机器人主管爆吹,圣诞节刷屏了
量子位· 2025-12-26 04:24
Core Viewpoint - Tesla's FSD v14 has been recognized as the first AI to pass the "physical Turing test," showcasing significant advancements in autonomous driving technology [1][7]. Group 1: User Experience and Feedback - Jim Fan, NVIDIA's robotics head, expressed astonishment at the FSD v14 experience, stating it felt indistinguishable from a human driver [3][4]. - User feedback on FSD v14 has been overwhelmingly positive, with many Tesla owners reporting an addictive quality to the technology [6][10]. - Specific user experiences highlight FSD's improved decision-making, such as effectively reading parking signs and executing lane changes decisively [11][12][26]. Group 2: Technical Enhancements - The FSD v14.2.2 update includes significant upgrades to the neural network's visual encoder, enhancing perception and understanding capabilities [32]. - New features allow for better recognition of emergency vehicles and dynamic navigation adjustments in response to real-time traffic conditions [35][37]. - The update introduces two new driving modes, SLOTH and MADMAX, which cater to different driving styles and preferences [44]. Group 3: Competitive Landscape - Tesla's Robotaxi service is still in its early stages, with approximately 30 vehicles deployed in Austin, compared to Waymo's nearly 200 vehicles in the same area [42]. - Waymo leads in market presence and operational scale, with over 2,500 vehicles across multiple cities and a significant number of weekly paid rides [43][47]. - Despite the current gap, Tesla's FSD improvements and growing user interest indicate a potential for accelerated growth in the Robotaxi market [53][54]. Group 4: Future Outlook - Elon Musk has set ambitious goals for Tesla's Robotaxi service, aiming for full autonomy without safety monitors, which appears to be progressing with the latest FSD updates [29][30]. - The ongoing competition between Tesla and Waymo highlights differing technological approaches, with Tesla focusing on a neural network model while Waymo relies on a modular system [63]. - The future of autonomous driving technology will likely influence consumer purchasing decisions, making it a critical area for both companies [69].
用AI代码替换Windows里每一行C/C++!微软回应了
量子位· 2025-12-25 13:32
Core Viewpoint - Microsoft has denied plans to rewrite Windows 11 using AI, contradicting earlier statements from an internal engineer about eliminating C/C++ code by 2030 through AI and Rust integration [2][3][9]. Group 1: Microsoft’s AI Strategy - The initial claim by a Microsoft engineer suggested that one engineer could rewrite one million lines of code in a month, which sparked significant online debate and concern about the feasibility and risks of such an approach [4][5][10]. - Many users expressed admiration for Microsoft's ambition but also raised alarms about the potential risks associated with aggressively pushing AI into critical codebases [6][10]. - The engineer later clarified that the post was intended to attract like-minded engineers and not to announce a new strategy for Windows 11, emphasizing that the project was more about exploring technology for language migration rather than a definitive plan [16][17]. Group 2: Concerns Over Code Quality and Legacy Issues - The transition from C/C++ to Rust raises concerns about the quality of AI-generated code, with estimates suggesting that current AI technology could produce a bug for every ten lines of code, leading to significant potential issues in a large codebase [13][25]. - Microsoft's historical reliance on C/C++ has resulted in approximately 70% of Windows security vulnerabilities being attributed to these languages, highlighting the need for a more secure alternative like Rust [25][26]. - The complexity and legacy of Windows code, accumulated over decades, pose significant challenges for any large-scale rewrite, as many existing implementations may be critical to system stability [38][40]. Group 3: Rust as a Potential Solution - Rust is viewed as a promising alternative due to its design focus on memory safety, which could help mitigate long-standing security issues in Windows [27][34]. - However, Rust's ecosystem is still maturing, and the transition would require substantial investment in developer training and adaptation, which could hinder immediate implementation [43][44]. - Despite the challenges, Microsoft has begun experimenting with Rust in rewriting parts of the Windows kernel, although this effort remains limited to a few modules [36]. Group 4: The Role of AI in Development - The rapid advancement of AI programming capabilities presents an opportunity for Microsoft to leverage AI as a bridge in transitioning to Rust, potentially reducing the barriers associated with the switch [45]. - However, the effectiveness of AI as a reliable tool for such critical tasks remains uncertain, and current AI technologies may not yet be capable of handling the complexities involved in core system engineering [46][48]. - Microsoft's CEO has emphasized the importance of AI in the company's future, indicating a strong internal push towards integrating AI into development processes, but the recent backlash suggests a need for a more measured approach [50][53][56].
6999起!小米史上最贵Ultra来了:告别256G,影像硬刚iPhone 17 Pro Max
量子位· 2025-12-25 13:32
Core Viewpoint - Xiaomi has launched its new flagship imaging smartphone, the 17 Ultra, which emphasizes optical photography enhancements and features significant upgrades over its predecessor, the 15 Ultra [2][3]. Pricing and Variants - The starting price for the 17 Ultra is 6,999 yuan for the 12GB+512GB version, with additional configurations of 16GB+512GB priced at 7,499 yuan and 16GB+1TB at 8,499 yuan [7]. - A special edition, "Xiaomi 17 Ultra by Leica," is available, with prices starting at 7,999 yuan for the 16GB+512GB version and 8,999 yuan for the 1TB version, both 500 yuan more than their standard counterparts [9]. Imaging Technology - The 17 Ultra features a 1-inch sensor with a 3.2-micron pixel size and an f/1.67 aperture, allowing for double the light intake compared to the iPhone 17 Pro Max [16][17]. - The LOFIC technology enhances dynamic range, with the new pixel structure offering 6.3 times the electronic capacity of the previous generation, improving performance in high-contrast scenes [19][20][21]. - The device includes a "fireworks capture" mode, designed for challenging lighting conditions, showcasing its advanced imaging capabilities [29]. Optical Zoom and Performance - The 17 Ultra incorporates a 200-megapixel continuous optical zoom, utilizing a 28nm process that reduces power consumption by 40% [35]. - It achieves high-quality imaging across various focal lengths without relying on digital cropping, maintaining full resolution [46][49]. - The optical architecture includes eight elements in three groups, with special glass lenses that enhance light transmission and color accuracy [50][54]. Memory and Market Trends - The 17 Ultra starts with a minimum storage of 512GB, reflecting a shift in consumer demand towards higher memory capacities due to the rising need for AI applications [60][64]. - The overall memory supply chain is experiencing price increases, impacting smartphone pricing strategies [65].
单卡2秒生成一个视频!清华联手生数开源TurboDiffusion,视频DeepSeek时刻来了
量子位· 2025-12-25 11:51
Core Viewpoint - The article discusses the introduction of TurboDiffusion, an open-source framework developed by Tsinghua University's TSAIL lab and Shenshu Technology, which significantly accelerates video generation, achieving speeds up to 200 times faster while maintaining high quality [2][3][39]. Group 1: Speed and Efficiency - TurboDiffusion allows for the generation of a 5-second video at 480P resolution in just 1.9 seconds on a single RTX 5090 GPU, compared to the original time of approximately 184 seconds [3][13]. - For a 720P video, the TurboDiffusion framework can generate content in 24 seconds, a substantial improvement over previous models [12]. - The framework's enhancements enable real-time video generation, reducing the generation delay from 900 seconds to just 8 seconds for high-quality 1080P videos [16][39]. Group 2: Technical Innovations - TurboDiffusion incorporates four key technologies to optimize video generation: SageAttention, Sparse-Linear Attention (SLA), rCM step distillation, and W8A8 quantization [22][24][32]. - SageAttention2++ reduces the computational load of attention mechanisms, achieving a speed increase of 3-5 times while halving memory usage [25][27]. - SLA focuses on important pixels and maintains linear complexity, allowing for additional speed improvements when combined with SageAttention [28][29]. Group 3: Industry Impact - The advancements made by TurboDiffusion are expected to lower cloud inference costs significantly, enabling service to 100 times more users with the same computational power [42]. - The technology is compatible with domestic AI chip architectures, promoting self-sufficiency in China's AI infrastructure [42]. - The framework opens up new possibilities for real-time video editing, interactive video generation, and automated short film production, potentially leading to innovative product forms in the AIGC sector [42].
向量检索爆雷!傅聪联合浙大发布IceBerg Benchmark:HNSW并非最优,评估体系存在严重偏差
量子位· 2025-12-25 11:51
Core Insights - The integration of multimodal data into RAG and agent frameworks is a hot topic in the LLM application field, with vector retrieval being the most natural recall method for multimodal data [1] - There is a misconception that vector retrieval methods have been standardized, particularly the use of HNSW, which does not perform well in many downstream tasks [1] - A new benchmark called IceBerg has been introduced to evaluate vector retrieval algorithms based on downstream semantic tasks rather than traditional metrics like Recall-QPS, challenging past industry perceptions [1] Group 1: Misconceptions in Vector Retrieval - Many believe that vector retrieval methods are standardized, leading to a reliance on HNSW without considering its performance in real-world tasks [1] - The evaluation systems used in the past only scratch the surface of the complexities involved in vector retrieval [1] - A significant disparity exists between the perceived effectiveness of vector retrieval methods and their actual performance in downstream tasks [7] Group 2: Case Studies and Findings - In a large-scale facial verification dataset (Glink360K), the accuracy of facial recognition reached saturation before achieving a Recall of 99%, indicating a disconnect between distance metrics and actual task performance [5] - NSG, a state-of-the-art vector retrieval algorithm, shows absolute advantages in distance metric recall but underperforms in downstream semantic tasks compared to RaBitQ [5] - Different metric spaces can lead to vastly different outcomes in downstream tasks, highlighting the importance of metric selection in vector retrieval [6] Group 3: Information Loss and Model Limitations - An information loss funnel model is proposed to illustrate how information is lost at each stage of the embedding process, leading to discrepancies in expected outcomes [7] - The capacity of representation models directly affects the quality of embeddings, with generalization errors and learning objectives impacting performance [10][11] - Many models do not prioritize learning a good metric space, which can lead to significant information loss during the embedding process [13] Group 4: Metric and Algorithm Selection - The choice of metric (Euclidean vs. inner product) can have a substantial impact on results, especially when using generative representation models [15] - Different vector retrieval methods, categorized into space partitioning and graph-based indexing, perform differently based on data distribution [17] - The IceBerg benchmark reveals a reshuffling of vector retrieval algorithm rankings, demonstrating that HNSW is not always the top performer in downstream tasks [18] Group 5: Automation and Future Directions - IceBerg provides an automated algorithm selection tool that helps users choose the right method without extensive background knowledge [21] - Statistical indicators can reveal the affinity of embeddings to metrics and algorithms, facilitating automated decision-making [23] - The research team calls for future vector retrieval studies to focus on task-metric compatibility and the development of unified vector retrieval algorithms [25]
2500元/月雇个总监级AI数字员工,贵吗?
量子位· 2025-12-25 11:51
Core Insights - A profound transformation in corporate structure is occurring in Silicon Valley, where AI agents are evolving from mere tools to autonomous colleagues, significantly impacting the real estate industry [1] - The shift from traditional software to AI-driven digital employee teams is redefining business processes and operational costs [2][3] Group 1: AI in Real Estate - The real estate sector, characterized by high capital intensity and complex decision-making, is becoming a breakthrough area for AI applications [3] - Deep Intelligence has launched the "Real Estate AI-Ready" strategy, introducing a digital employee team that covers decision-making, marketing, and service scenarios [3][4] - The digital employees can produce comprehensive market analysis reports with high accuracy and efficiency, comparable to senior analysts, at a fraction of the cost [3][11] Group 2: Cost Efficiency and Workforce Transformation - Traditional real estate marketing teams require 6-8 personnel with monthly costs exceeding 150,000 yuan, while digital employees can cover the same functions for around 2,500 yuan, reducing labor costs by over 90% [11] - The future organizational structure will focus on maximizing AI efficiency, allowing human experts to concentrate on high-value tasks while digital employees handle standardized, time-consuming tasks [11] Group 3: Unique Advantages of Specialized AI - Unlike general AI models, Deep Intelligence's digital employees are designed to meet specific job requirements, ensuring they can perform complex tasks without the need for extensive training [12][13] - The proprietary AI space developed by Deep Intelligence integrates industry knowledge, business processes, and private data, creating a robust operational foundation for real estate applications [13][16] Group 4: Industry Trends and Future Outlook - The digital employee market in China is projected to reach 4.12 billion yuan in 2024, with a year-on-year growth of 85.3%, indicating a strong trend towards AI integration in various industries [19] - Companies that leverage AI to enhance their intellectual capacity will have a competitive edge, as opposed to those relying solely on human resources [20][21]
量子位编辑作者招聘
量子位· 2025-12-25 11:51
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 任职要求: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰 ...
揭秘Agent落地困局!93%企业项目卡在POC到生产最后一公里|亚马逊云科技陈晓建@MEET2026
量子位· 2025-12-25 06:08
Core Insights - The true value of Agents lies not in their impressive demonstrations but in their ability to operate effectively in production environments. Data indicates that over 93% of enterprise Agent projects get stuck in the transition from Proof of Concept (POC) to production [1][17]. Group 1: Agent Development and Challenges - A successful Agent requires three essential modules: the model (brain), code (logic), and tools (connecting to the physical world). The effective integration of these three components presents the greatest engineering challenge [7][9]. - The transition from POC to production is hindered by significant obstacles, primarily due to data quality discrepancies and a lack of engineering capabilities [7][17]. - The best time for model customization is during the foundational model training phase, similar to how humans learn languages more effectively at a young age [21][23]. Group 2: Engineering and Deployment Solutions - To address the challenges faced during the deployment and production phases, the company has introduced Amazon Bedrock AgentCore, a comprehensive toolbox designed to manage foundational infrastructure dynamically [20]. - The introduction of Strands Agents simplifies the development process, allowing complex functionalities to be achieved with significantly less code, enhancing efficiency [13][30]. - The company has also launched features to support TypeScript and edge device deployment, expanding the applicability of Agents across various platforms [15][30]. Group 3: Automation and Workflow Integration - The emergence of large models has opened new possibilities for workflow automation, with the development of Amazon Nova Act, which integrates large model capabilities with engineering functionalities for end-to-end automation [29]. - The success rate of automation using Nova Act can reach over 80%, showcasing its effectiveness compared to traditional RPA tools [29]. Group 4: Case Studies and Industry Impact - Blue Origin has built over 2,700 internal Agents using Bedrock and Strands Agents, achieving a 75% improvement in delivery efficiency and a 40% enhancement in design quality [30]. - Sony has developed an internal "Data Ocean" platform, serving over 57,000 internal users and processing up to 150,000 inference requests daily, while also improving compliance review efficiency by 100 times through model fine-tuning [30].