量子位
Search documents
2700GB高质量数据,训出空间智能SOTA,背后秘诀全栈开源
量子位· 2026-03-31 03:06
Core Viewpoint - The article emphasizes that the limitation of spatial intelligence in robotics is primarily due to insufficient data, which affects the generalization ability of models, leading to reliance on hardware solutions [1][2]. Group 1: Data Challenges in Robotics - The lack of reliable data sources has historically forced the industry to compensate by enhancing hardware capabilities, particularly in the use of RGB-D cameras for spatial perception [3][4]. - RGB-D cameras, while popular, face significant challenges in accurately perceiving environments, especially in the presence of reflective or transparent surfaces, which can lead to erroneous data [5][6][9]. Group 2: Introduction of LingBot-Depth-Dataset - Ant Group's LingBot-Depth-Dataset has been introduced as a solution to the data scarcity issue, comprising 2.71TB of data with 3 million pairs of labeled RGB-D data, including real and synthetic data from various environments [11][13][20]. - The dataset's diverse data distribution, collected from multiple depth cameras, enhances its applicability for training models in different scenarios, thus improving generalization [18][19]. Group 3: Advancements in Spatial Intelligence - The deployment of LingBot-Depth has enabled robots to effectively grasp transparent and reflective objects, a task previously deemed challenging [22]. - Following this, Ant Group has released additional models like LingBot-VLA and LingBot-World, which integrate visual, linguistic, and action capabilities, further advancing the field of embodied intelligence [24][25][28]. Group 4: Software vs. Hardware in AI Development - The article highlights a shift in focus within the industry towards prioritizing data and algorithm architecture over merely increasing the number and cost of sensors, as seen in the autonomous driving sector [30][31]. - This approach suggests that enhancing spatial intelligence through software methods can lead to more effective and cost-efficient solutions in robotics, aligning with the broader trend of prioritizing data-driven advancements [29][31].
唐杰高徒打造龙虾投资军团!量化私募全线Agent,开源狂揽39k星
量子位· 2026-03-31 03:06
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 好家伙,投资版龙虾也来了。还是GitHub开源项目里最近高热高赞的那种。 一个叫 TradingAgents 的开源金融框架,组了个 交易团队替你上钟了。 GitHub上狂揽 39k+ Star ,Fork超7.2k,不知道的还以为是哪家量化私募把家底给捐了(doge)。 它的玩法很简单,在系统里设好几个专业角色协同干活,有基本面分析师、情绪分析师、技术分析师、研究员、交易员、风险管理师…… 这些角色扒完信息、经过一番评估讨论之后,最终再一起拍板要不要交易。 TradingAgents把复杂的交易拆成五个协作层: 每轮交易起手的是分析师团队,四个角色同时开工,从不同维度薅数据。 框架支持多家主流模型API接入,OpenAI、Anthropic……统统能用! 基金经理、风控、交易员全配齐 这个框架来自Tauric Research团队,是一家专注于用AI做智能金融交易的科研技术公司。 创始人——TradingAgents的核心作者, 肖易佳 ,本科毕业于清华大学,师从智谱发起人 唐杰 教授,现为UCLA计算机系博士生。 | Team | Progress A ...
让大模型看懂「高亮标注」:在注意力计算前编辑Key向量,用频谱分解让模型「听你指挥」丨ICLR'26
量子位· 2026-03-31 03:06
SEKA团队 投稿 量子位 | 公众号 QbitAI 想让大模型重点关注提示词里的某句话可没那么容易。 在NLP领域,注意力引导 (Attention Steering) 是控制大语言模型 (LLM) 聚焦行为的核心技术之一,其中提示高亮 (Prompt Highlighting) ,即让模型优先关注用户指定的关键文本是一项关键策略。 然而,现有方法因需要显式存储完整注意力矩阵,与FlashAttention等高效实现完全不兼容,带来了严重的延迟与显存瓶颈。 为了攻克这一难题,来自爱丁堡大学的Weixian (Waylon) Li联合华为英国研究所、伦敦玛丽女王大学以及RayNeo的合作者,提出了 SEKA (Spectral Editing Key Amplification) 及其自适应变体 AdaSEKA 。 该方法另辟蹊径,在注意力计算之前直接编辑Key向量,通过频谱分解学习"相关性子空间"来引导注意力分配,天然兼容FlashAttention, 延迟开销几乎为零。目前,该项工作已被人工智能顶级会议 ICLR 2026 接收。 这一操作在数学上等价于为注意力分数添加了一个低秩偏置项,但因为它完全作用于 ...
Claude Code能控制电脑了!开发全程不离终端,全无人值守模式启动
量子位· 2026-03-31 01:53
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI Claude Code上线 Computer Use ,直接捅破开发效率天花板。 在官方演示中,只甩一个指令过去,AI就 自己启动正在开发的应用,自己复现bug,自己修复,自己测试修复效果 。 相当于直接给每个开发者配了个全能测试工程师。 这已经是Anhtropic在 60天里的第76个更新 。 与上周更新的桌面端Computer Use不同,CLI端更适合和现有开发工作流集成。 为什么要让CLI工具拥有操控电脑的能力? 归根到底,追求的是无需离开终端即可的开发体验。 能完美融入开发者现有的命令行工作流,不需要切换界面,效率提升更明显。 缺点自然也是有的: Mac独占 ,Windows和Linux用户都哭了。 还有很多人吐槽,连普通的代码生成额度都不够用,现在又出了个一看就更烧token的功能,到底是谁用得起啊? 这次更新支持的场景几乎覆盖了开发者的所有日常痛点。 特别是进入三月,几乎每天都有新功能上线,Code Review、Channels、Dispatch、Computer Use,全是硬货。 在这张图制成之后,3月25日发布了 Claude Cod ...
2倍提速!KV缓存压缩不只看重要性,上交大团队让模型推理「又快又稳」 | ICLR'26
量子位· 2026-03-31 01:53
Core Insights - The article discusses the challenges and solutions related to KV cache compression in long-context reasoning for Vision-Language Models (VLM) and Large Language Models (LLM) [1][2][42] - It introduces MixKV, a method that combines importance and diversity in KV cache selection to enhance stability and coverage in compressed contexts [5][13][42] Group 1: KV Cache Challenges - The lengthening of context leads to linear expansion of KV cache, resulting in increased memory usage and bandwidth costs, which negatively impacts throughput [3][5] - Traditional compression methods often focus solely on "importance," neglecting the inherent "semantic redundancy" present in multimodal KV caches, which can lead to instability [5][12] Group 2: Key Findings - The research team visualized the statistical properties of KV, revealing that multimodal inputs exhibit a higher degree of semantic redundancy, indicating a larger compressible space [8][10] - There are significant differences in redundancy levels across different heads within the same model, suggesting a non-uniform distribution of redundancy [10][12] Group 3: MixKV Solution - MixKV aims to retain KV entries that are both important and diverse, thereby reducing the risk of losing semantic coverage due to redundancy [13][23] - The method consists of two scoring steps (importance and diversity) and a head-wise mixing approach to adaptively balance the two factors based on redundancy levels [14][15][16] Group 4: Experimental Results - MixKV demonstrated consistent performance improvements across various benchmarks in multimodal understanding, long-context reasoning, and GUI localization tasks [25][29][37] - The method showed significant efficiency gains, reducing inference latency and peak memory usage under extreme compression conditions [41][42] Group 5: Conclusion - MixKV represents a critical upgrade for KV cache compression in long-context reasoning, emphasizing the need to consider redundancy structures in the design paradigm for scalable deployment of VLMs and LLMs [42]
全球OCR新王来自中国开源!GitHub狂揽73300+Star
量子位· 2026-03-30 10:36
Core Viewpoint - The article highlights the historic shift in the OCR (Optical Character Recognition) landscape, with Baidu's PaddleOCR surpassing Google's Tesseract OCR to become the top OCR project on GitHub, marking a significant achievement for Chinese open-source initiatives in this domain [2][5][71]. Group 1: PaddleOCR's Rise - PaddleOCR has achieved over 73,300 stars on GitHub, officially dethroning Tesseract OCR, which had dominated the field for nearly 40 years [2]. - The project has also maintained a leading position on Hugging Face, becoming an essential tool for global developers in OCR and document parsing [3]. - The rapid growth of PaddleOCR is attributed to its integration with Baidu's Wenxin large model, which has enhanced its capabilities significantly [13][15]. Group 2: Technological Innovations - PaddleOCR's success is rooted in its innovative approach, utilizing a data-centric optimization strategy that emphasizes the quality and diversity of training data rather than merely increasing model size [34][40]. - The introduction of models like PaddleOCR-VL and PaddleOCR-VL-1.5 has set new benchmarks in document parsing, achieving scores of 92.6 and 94.5 on the OmniDocBench V1.5, respectively [20][22]. - PaddleOCR-VL's unique "Coarse-to-Fine" architecture allows for efficient processing of high-resolution documents by focusing on key areas, significantly reducing computational costs [44][46]. Group 3: Market Dynamics and Future Trends - The OCR market is experiencing a surge in competition, with numerous companies launching new OCR models, indicating a growing recognition of the importance of OCR technology in AI [49][62]. - The role of OCR is evolving from a simple document extraction tool to a foundational element in the data ecosystem for large models, enabling better understanding and processing of real-world information [65][67]. - Future developments in OCR are expected to focus on specialized applications and enhanced collaboration between local and cloud-based processing, paving the way for more sophisticated information processing solutions [69][70].
别再让AI只干零活了!AI工具正在接管投放全链路
量子位· 2026-03-30 10:36
Core Viewpoint - The integration of AI into the marketing industry is an established trend, with significant growth potential and challenges in implementation [1][2][4]. Group 1: Market Overview - The AI marketing market in China reached a scale of 66.9 billion yuan last year, with a compound annual growth rate of 26.2% [2]. - The growth is driven by concentrated investments across the entire industry chain, from content production to advertising decision-making [3]. Group 2: Current Challenges - Most AI marketing tools currently exist in isolated forms, addressing only specific issues, requiring advertisers to connect different stages themselves [5][6]. - The complexity of marketing scenarios makes it difficult for AI to be effectively implemented, as each stage has different technical requirements and high interdependence [11][13]. Group 3: AI Marketing Evolution - The industry is recognizing the need for multi-stage collaboration, leading to a clearer trend towards AI integration across the entire marketing chain [7]. - Kuaishou's commercial AI exemplifies this approach, integrating AI at every decision-making point from pre-campaign material production to post-campaign analysis [8][25]. Group 4: Technical Solutions - Kuaishou's approach involves designing specific engineering solutions for each marketing scenario, ensuring that AI capabilities operate within a unified data system [24][50]. - The marketing process includes several common stages: material production, strategy formulation, advertising execution, and diagnostic review, all of which Kuaishou's AI capabilities address [25][53]. Group 5: Material Production - In the material production phase, Kuaishou uses large models to transform the concept of "good material" into quantifiable structures, allowing for scalable replication [27][30]. - This process involves analyzing historical data and industry trends to identify common features that can be standardized [30]. Group 6: Strategy Formulation - Kuaishou employs a multi-agent collaboration model for strategy formulation, allowing for parallel processing of tasks that traditionally required extensive human collaboration [33][36]. - This method significantly reduces the time required for strategy development, enhancing the overall quality of the output [37]. Group 7: Advertising Execution - The advertising execution phase demands the highest technical standards, with real-time signal perception capabilities embedded in Kuaishou's system to ensure timely decision-making [40][42]. - AI continuously monitors various data streams to automatically trigger necessary actions without human intervention [42]. Group 8: Diagnostic Review - The diagnostic review phase is crucial yet often neglected due to its complexity; Kuaishou's AI facilitates cross-stage attribution, integrating all data into a unified analysis framework [49][50]. - AI generates comprehensive review documents that provide actionable insights for future campaigns, transforming the review process into a valuable input for subsequent strategies [51][52]. Group 9: Strategic Importance - Kuaishou's commitment to a full-chain AI approach stems from the limitations of single-point AI tools, which fail to enhance overall efficiency despite improving localized tasks [54][55]. - The ultimate goal is to ensure that advertisers achieve sustained business growth on the platform, which in turn supports a healthy commercial ecosystem [59][60].
VLM解几何题总翻车?GEODPO从「看」入手:用结构化表示+DPO优化,让模型先看懂再推理丨ICLR'26
量子位· 2026-03-30 10:36
Core Viewpoint - The research highlights that the failure of visual language models (VLMs) in geometric problems is primarily due to perceptual errors rather than reasoning difficulties, which has not been systematically analyzed in existing studies [3][4]. Group 1: Geometric Perception Issues - VLMs exhibit significant performance drops when dealing with geometric shapes, indicating a shortcoming in their geometric perception capabilities [2][3]. - Common issues include misidentifying basic geometric elements (points, lines, circles), failing to detect key structural relationships (collinearity, perpendicularity, tangency), and grounding errors in images [10][11]. Group 2: GEOPERCEIVE Framework - The research team introduced GEOPERCEIVE, the first independent evaluation framework focused on geometric perception capabilities, which allows for a clearer analysis of model performance by separating perceptual errors from reasoning errors [9][25]. - GEOPERCEIVE assesses models at a granular level, evaluating the accuracy of each geometric element and structural relationship, thus pinpointing specific capability bottlenecks [16][25]. Group 3: GEODPO Optimization Method - GEODPO, a Translator-Guided Reinforcement Learning method, was proposed to optimize model performance by using structured rewards based on geometric matching scores, enhancing stability and interpretability [19][26]. - The method demonstrated improved geometric perception capabilities and better out-of-distribution generalization, indicating its effectiveness in addressing distribution shifts [21][26]. Group 4: Implications and Future Directions - The findings suggest that geometric perception is a crucial factor influencing geometric reasoning performance, and structured reinforcement learning offers a stable optimization path [26]. - The research paradigm established through this work can be extended to other complex tasks, emphasizing the importance of structured representation and computable reward functions in model training [28][29].
一年一度最值得关注的AI榜单来啦!申报即日启动
量子位· 2026-03-30 10:36
Core Insights - The article discusses the transition of generative AI in China from a "new technology" to a "new tool" and now to a reality that businesses must confront, impacting various aspects such as content production, R&D efficiency, marketing methods, team collaboration, and decision-making processes [1] Group 1: Event Overview - The Fourth China AIGC Industry Summit will take place in May 2026, where Quantum Bit will announce the results of its evaluation of generative AI companies and products based on their performance and feedback over the past year [1][2] - The summit aims to invite millions of industry practitioners to witness the recognition of outstanding companies [2] Group 2: Evaluation Criteria for AIGC Companies - The evaluation will focus on companies that are either based in China or have their main business operations in China, primarily engaged in generative AI or have widely applied AI in their core business [7] - Companies must have demonstrated outstanding performance in technology/products and commercialization over the past year [7] Group 3: Evaluation Dimensions for AIGC Companies - The evaluation will consider several dimensions: 1. **Technical Dimension**: Assessing the company's technical strength, R&D capabilities, and innovation [12] 2. **Product Dimension**: Evaluating the core product's innovation, market adaptability, and user experience [12] 3. **Market Dimension**: Analyzing the company's market performance and growth opportunities [12] 4. **Potential Dimension**: Focusing on the core team's strength and brand potential [12] Group 4: Evaluation Criteria for AIGC Products - The products must be based on generative AI capabilities, have mature technology, be market-released with a certain user scale, and have significant technological innovations or functional iterations in the past year [13] - Evaluation will focus on: 1. **Product Technical Strength**: Advanced technology, maturity, and efficiency [13] 2. **Product Innovation**: Uniqueness in functionality, experience, and application scenarios [13] 3. **Product Performance**: User feedback and market performance [13] 4. **Product Potential**: Future development and market expansion potential [13] Group 5: Registration Information - Registration for the evaluation is open now and will close on April 27, with results announced at the May summit [14] - Companies can register through a provided link or contact Quantum Bit staff for inquiries [14]
整个公司一起吃虾!这个开源项目,让OpenClaw实现企业级部署
量子位· 2026-03-30 09:16
Core Insights - The article discusses the challenges of scaling OpenClaw for enterprise use, highlighting the need for user permission management, resource allocation, and auditing capabilities [1] - ClawManager is introduced as an open-source project designed to fill the management gaps in OpenClaw for enterprise deployment [2][4] Summary by Sections ClawManager Overview - ClawManager is the first enterprise-level deployment management solution for OpenClaw, aimed at enhancing its management capabilities [4] - The project has low deployment requirements, needing only one Kubernetes node, 4 CPU cores, 8GB RAM, and 20GB disk space, making it accessible for small teams [5] Capability Modules - ClawManager consists of eight modules divided into two core layers: instance management and AI governance, which together support an operational enterprise-level OpenClaw environment [6] - Key modules include: - Desktop Cluster Management: Batch deployment and lifecycle management, reducing deployment time from hours to minutes [7] - AI Gateway: Unified model access and routing for seamless integration with OpenClaw [7] - Auditing and Observability: Trace tracking and session analysis for compliance [7] - Risk Governance: Rule engine and sensitive content detection for secure AI usage [7] - Cost Management: Tracking and analyzing AI usage costs [7] - Security Infrastructure: Egress proxy and secret management for enhanced security [7] - OpenClaw Special: Configuration backup and memory migration for asset persistence [7] - Multi-language Management Backend: Unified control for user, instance, and resource management [7] Instance Management - The instance management layer focuses on the relationship between users and the environment, allowing for quick environment creation through CSV import for user lists [10] - Each instance can have individually configured CPU, memory, and GPU limits, ensuring isolation through Kubernetes mechanisms [10] AI Governance - The AI governance layer addresses the relationship between calls and compliance, featuring an AI Gateway for unified model requests and routing [14][15] - Each LLM call generates a unique trace ID for auditing, allowing for detailed tracking and compliance checks [16] - Cost statistics can be categorized by token types, providing insights into usage expenses [17] Security and Compliance - ClawManager includes a rules engine for sensitive content detection, establishing clear safety boundaries for enterprise AI usage [19] - The unified authentication gateway and complete call auditing facilitate compliance and security, making it easier for companies to scale OpenClaw [36] Impact on Roles - The introduction of ClawManager changes the responsibilities of various roles within the organization, allowing IT teams to focus on proactive management rather than reactive troubleshooting [25][29] - Users can confidently use OpenClaw as a long-term work environment due to the unified backup and cross-instance migration capabilities [30][32] Open Source and Community - ClawManager is released under the MIT license, allowing for code review and ensuring data sovereignty for enterprises [37] - The open-source nature of the project enables community contributions, enhancing the tool's capabilities and fostering shared experiences among different organizations [43] Conclusion - The shift towards open-source enterprise AI infrastructure tools democratizes access to advanced management capabilities, allowing smaller teams to compete with larger organizations [40][42] - Each deployment contributes to the broader AI Agent ecosystem, facilitating the scaling of AI solutions across various industries [44]