量子位
Search documents
攻克结构化长文档检索难题!新框架让模型告别“结构性失明”
量子位· 2025-09-25 11:42
Core Insights - The article introduces SEAL (Structure and Element Aware Learning), a new contrastive learning framework designed to enhance the understanding of long documents by models through structural awareness and element alignment [1][8]. Group 1: SEAL Framework Overview - SEAL innovatively integrates both the macro-level structure and micro-level semantic elements of documents into a unified embedding space, significantly improving pre-trained language models' ability to understand and represent structured data [3]. - The framework addresses two main challenges in long document retrieval: how to make models aware of document hierarchy and how to promote precise alignment between user queries and specific document elements [18] [25]. Group 2: Training Strategies - The framework employs two complementary training strategies: Structure Aware Learning (SAL) and Element Aware Learning (EAL) [9]. - SAL focuses on understanding the "skeleton" of documents by presenting models with two versions of a document—one with structural tags and one without, encouraging the model to learn the inherent structural functions of text segments [12][13]. - EAL enhances the model's grasp of local elements' semantic roles by introducing a masking mechanism, requiring the model to infer overall document relevance based on incomplete information [14][15]. Group 3: Experimental Results - The application of the SEAL framework led to a notable improvement in the BGE-M3 model's retrieval ranking quality, with the MRR@10 metric increasing from 73.96% to 77.84% [17][19]. - The results indicate enhanced capability in ranking more relevant results higher, validated by online A/B testing [20]. Group 4: Open Source Dataset - The team released a new dataset named StructDocRetrieval, containing long documents with structural annotations, significantly surpassing typical short datasets like MS MARCO [21][22]. - This dataset, utilizing HTML format, provides rich structural semantic annotations, filling a gap in the field [23]. Group 5: Broader Implications - The SEAL method's refined understanding of structural information can provide more reliable information sources for downstream tasks, such as aiding AI assistants in accurately locating technical document answers [25]. - The framework shows promising applications in specialized fields like enterprise knowledge bases and legal technology [25].
机器狗腿被锯了也能继续走!最新机器人大脑来自320亿估值独角兽
量子位· 2025-09-25 11:42
Core Viewpoint - Skild AI has developed a revolutionary AI brain, Skild Brain, capable of controlling various robots in unpredictable situations, achieving a valuation of $4.5 billion as of June 2023 [4][29]. Group 1: Skild Brain Capabilities - Skild Brain can adapt to different robot bodies and situations, allowing it to control robots even when they face unexpected challenges like motor jams or limb loss [7][12]. - The AI brain was trained in a virtual environment simulating 100,000 different robot postures over a simulated time of 1,000 years, leading to emergent control capabilities [4][12]. - It can learn from failures and improve its performance over time, demonstrating a memory capacity over 100 times longer than typical robot control strategies [17][24]. Group 2: Testing and Adaptation - Skild Brain successfully adapted to various scenarios, such as simulating limb loss and adjusting walking patterns accordingly, while traditional controllers failed [19][20]. - The AI demonstrated the ability to switch control strategies based on the robot's physical state, such as transitioning from a wheeled to a bipedal walking pattern when necessary [21][24]. - Initial instability in new configurations, like walking on stilts, was quickly overcome as the AI adjusted its movements to maintain balance [22][24]. Group 3: Company Background and Funding - Skild AI was founded in 2023, focusing on developing adaptive AI for different hardware and tasks, with a small team of approximately six employees [25]. - The company has raised a total of $414 million across seed, Series A, and Series B funding rounds, with notable investors including SoftBank, Nvidia, and Sequoia Capital [29]. - The valuation of Skild AI increased from $1.5 billion after Series A funding in July 2024 to $4.5 billion following a $100 million funding round in June 2023 [29].
你的AI助手更万能了!天禧合作字节扣子,解锁无限新功能
量子位· 2025-09-25 11:42
Core Viewpoint - The collaboration between Lenovo's Tianxi Super Intelligent Agent and ByteDance's Kousi platform marks a significant step towards creating an integrated AI ecosystem, enhancing AI capabilities and user experience through a unified entry point for various AI applications [1][9]. Group 1: Tianxi Super Intelligent Agent Features - Tianxi Super Intelligent Agent is designed as the "AI brain" for smart terminal devices, integrating voice, text, and visual interaction capabilities, along with full-time memory and autonomous planning [1]. - It offers five key functionalities: AI control, AI search, AI translation, AI note-taking, and AI services, providing a comprehensive smart experience across devices and ecosystems [1]. Group 2: Collaboration with Kousi Platform - The partnership with Kousi aims to address the challenges faced by AI developers, particularly the difficulty in distribution after easy development, creating a streamlined path from AI concept to commercialization [3]. - Developers can efficiently create personalized intelligent agents on the Kousi platform and seamlessly deploy them to Tianxi-enabled devices, significantly reducing product launch cycles and operational costs [3]. Group 3: User Experience Enhancements - The integration of Kousi's AI capabilities into Tianxi allows users to access multiple AI functions through a single entry point, simplifying the user experience and lowering the barrier to AI usage [6][8]. - Users can now utilize various AI agents for specific tasks, such as travel planning or language practice, enhancing the overall value and convenience of the Tianxi platform [8]. Group 4: Future of AI Ecosystem - Lenovo's commitment to building an open and inclusive AI ecosystem is reinforced by this collaboration, with expectations for continued growth as more partners and developers join the Tianxi AI ecosystem [9]. - The vision for Tianxi is to evolve into a "silicon brain" that connects devices, data, and scenarios, providing seamless AI functionalities integrated into everyday life [9].
京东AI一揽子开源!超多核心项目全开源,GitHub万star项目也有新进展了
量子位· 2025-09-25 11:42
Core Insights - The article highlights the advancements of domestic AI agents, particularly JoyAgent, which has achieved significant accuracy improvements in global evaluations, positioning itself among the top tier of AI agents worldwide [1][10][43]. Group 1: JoyAgent and Its Features - JoyAgent is the first fully open-source enterprise-level AI agent, allowing businesses to deploy it without additional development [7][10]. - The recent upgrade to JoyAgent 3.0 includes the open-sourcing of DataAgent and DCP data governance modules, addressing data utilization challenges in enterprises [11][13]. - JoyAgent 3.0 has achieved a validation accuracy of 77% and a test accuracy of over 67% in the GAIA evaluation, reflecting its robust performance [1][43]. Group 2: Open Source Initiatives - JD Cloud has systematically open-sourced its AI capabilities, including the medical model 京医千询2.0, which integrates trustworthy reasoning and multimodal capabilities [5][53]. - The OxyGent multi-agent framework allows developers to assemble AI teams using a simple Python interface, promoting flexibility and ease of use [46][48]. - The open-source strategy aims to create a comprehensive ecosystem that addresses industry pain points and facilitates the practical application of AI technologies [72][76]. Group 3: Industry Impact and Future Directions - JD's open-source efforts are designed to lower the barriers for enterprises to adopt AI technologies, transforming complex business scenarios into accessible solutions [73][76]. - The initiative encourages collaboration among developers, fostering a community that can innovate and create new applications based on proven technologies [73][75]. - By establishing a unified technical standard through projects like the DGP data governance protocol, JD aims to enhance interoperability and drive industry-wide advancements [75][76].
中国团队重新定义“星际之门”!全球首个太空计算星座已实现常态化商用
量子位· 2025-09-25 11:42
Core Insights - The article discusses the successful deployment of traffic recognition models on satellites, marking a significant advancement in the use of space-based AI for urban traffic analysis [4][15][22] - This achievement indicates the transition of space computing from experimental to operational, establishing a new paradigm for AI deployment in the industry [15][23] Group 1: Space-Based AI Capabilities - The complete process of image collection, model inference, and structured result transmission was executed in orbit, demonstrating the feasibility of on-satellite computation [2][10] - The task was supported by the space computing constellation launched by Guoxing Aerospace, which is now in regular commercial operation [5][6] - The system can support models with billions of parameters and has full-process capabilities including image acquisition, model inference, task scheduling, and communication [12][13] Group 2: Commercialization and Operationalization - The successful execution of the task by the team from Jiadu Technology signifies the first commercial use of the global space computing constellation [9][15] - Guoxing Aerospace has become the first company globally to provide regular satellite-level space computing services, marking a milestone in the AI field [15][22] - The "Star Computing" plan aims to establish a green, low-carbon space computing infrastructure with a total computing power exceeding 100,000 PetaFLOPS [12] Group 3: Implications for AI Deployment - The ability to run AI models in orbit allows for a new dimension in data processing, reducing response times significantly by processing data at the source [21][22] - This shift not only changes the physical location of computation but also adjusts the system architecture, enabling faster decision-making for industries requiring rapid assessments [20][22] - The initiative redefines space as an integral part of intelligent systems, transforming it from merely a data source to an active processing environment [19][23]
“iFold”,苹果AI新成果
量子位· 2025-09-25 11:42
Core Viewpoint - Apple has launched a cross-domain AI model named SimpleFold for protein folding prediction, which has been informally referred to as "iFold" by users [1]. Group 1: Model Overview - SimpleFold utilizes a straightforward design based on general Transformer modules, achieving performance comparable to Google's AlphaFold2 with a 3 billion parameter version [2][8]. - The model simplifies the complex processes involved in protein folding prediction, making it more accessible for ordinary laboratories [3][7]. Group 2: Technical Details - The core of protein folding involves predicting the three-dimensional structure of a protein from its amino acid sequence [5][6]. - SimpleFold employs a multi-layer Transformer encoder as its backbone, adapting protein sequence features through adaptive layer normalization [10]. - The key innovation lies in the introduction of flow matching generation technology, which allows for smooth mapping from random noise distribution to protein conformation distribution, enabling one-step generation of atomic coordinates [11][12]. Group 3: Performance Metrics - The training dataset for SimpleFold consisted of 9 million entries, resulting in multi-scale models ranging from 100 million to 3 billion parameters. The 3 billion parameter model achieved 95% of AlphaFold2's performance on the CAMEO22 benchmark [14]. - In the CASP14 high-difficulty test set, SimpleFold outperformed similar flow matching models like ESMFold [15]. Group 4: Efficiency - On a MacBook Pro equipped with the M2 Max chip, SimpleFold can process a sequence of 512 residues in just two to three minutes, significantly faster than traditional models that require hours [18]. Group 5: Research Team - The lead author of the research, Yuyang Wang, has a strong academic background with degrees from Tongji University and Carnegie Mellon University, focusing on mechanical engineering and machine learning [18]. - The corresponding author, Jiarui Lu, also has a solid educational foundation from Tsinghua University and Carnegie Mellon University, and has contributed to Apple's open-source project ToolSandbox [21][22].
不止剪辑!剪映的未来是一站式AI视频平台
量子位· 2025-09-25 02:21
Core Viewpoint - The article emphasizes that Jianying (剪映) aims to transform from a simple video editing tool to a comprehensive AI creative partner, focusing on an all-in-one solution for video creation [2][4][30]. Group 1: AI Integration and Functionality - Jianying has upgraded its AI text-to-video capabilities, enhancing efficiency and storytelling coherence through deep integration with models like Doubao and DeepSeek [10][12]. - The platform now supports a wide range of materials, including raw images and videos, and offers a one-click AI rough cut feature that simplifies initial editing [15][16]. - New video transition features allow for seamless transitions between frames, creating a cinematic effect [18][19]. Group 2: Comprehensive Creative Process - Jianying's AI capabilities cover the entire creative process from inspiration and material generation to precise editing and output optimization [7][28]. - The introduction of AI music features, including lyric modification while retaining original melodies, enhances the audio editing experience [22]. - The platform has expanded its image creation capabilities, allowing for batch creative generation for cover and poster designs [24][25]. Group 3: Future Vision and Market Positioning - Jianying's slogan "All in AI, All in One" reflects its ambition to redefine video editing by integrating all necessary functions into a single platform [29][30]. - The company aims to become a co-creative partner that understands and anticipates creators' needs, thus streamlining the creative process [35][37]. - The focus on eliminating redundant tasks allows creators to concentrate on their imaginative processes, positioning Jianying as a leader in the AI creative tool market [38].
你的最快安卓芯片发布了!全面为Agent铺路
量子位· 2025-09-25 02:21
Core Insights - Qualcomm has launched the world's fastest Windows PC processor and mobile SoC processor, focusing on AI capabilities for both PCs and smartphones [1][5][27] - The Snapdragon X2 Elite Extreme is designed for high-end PCs, enabling advanced AI experiences and complex data analysis [15][24] - The Snapdragon 8 series mobile platform aims to support personalized AI assistants through continuous learning and real-time perception [1][27] Group 1: AI and Computing Architecture - AI is being positioned as the new user interface, shifting from smartphone-centric to agent-centric computing [6] - A new computing architecture is required to support this transition, with enhanced edge data relevance and mixed model development [6] - 6G technology is expected to bridge the cloud, edge, and terminal connections [6] Group 2: Snapdragon X2 Elite Series - The Snapdragon X2 Elite series utilizes a 3nm process and third-generation Oryon architecture, featuring 12 Prime cores and 6 Performance cores [7] - Compared to the previous generation, CPU efficiency has improved by 31%, and power consumption has decreased by 43% [10] - Peak performance metrics show a 39% increase in single-core CPU performance, 50% in multi-core, 2.3 times in GPU, and 78% in NPU [13] Group 3: Performance Comparisons - The Snapdragon X2 Elite Extreme achieves a 75% performance increase at the same power consumption compared to competitors, which would require an additional 222% energy to match [16][17] - In single-core performance, it leads by 44%, with competitors needing 144% more energy to catch up [20] - In GPU performance, it is 52% faster at the same power consumption, with competitors needing 92% more energy to achieve similar performance [22] Group 4: Snapdragon 8 Gen 2 - The fifth-generation Snapdragon 8 Gen 2 also employs a 3nm process and features a third-generation Oryon architecture [25] - It shows a 20% increase in single-core performance and a 17% increase in multi-core performance, becoming the fastest mobile CPU [27] - The upgraded Adreno GPU offers a 23% improvement in gaming performance and a 25% increase in ray tracing performance [28] Group 5: Power Efficiency and Features - Overall power consumption has decreased by 16%, with CPU power down by 35% and GPU by 20% [33] - The upgraded ISP supports advanced video encoding and AI enhancements for video processing [33] - The integrated X85 5G Modem-RF system enhances AI-driven WiFi capabilities, reducing gaming latency by 50% [34]
华为手表耳机都上新了!价格比不了苹果,续航苹果比不了
量子位· 2025-09-25 01:06
Core Viewpoint - Huawei's recent product launch is not just about new devices but aims to redefine the entire wearable audio experience by addressing overlooked "real problems" in user experience [5][48]. Group 1: HUAWEI WATCH GT 6 Series - The WATCH GT 6 series includes GT6 and GT6 Pro, maintaining a familiar business aesthetic while significantly enhancing internal functionalities [6][7]. - The battery capacity has been increased by 65%, allowing the 46mm version to last up to 21 days in light usage mode [10][11]. - The new generation Sunflower positioning system improves location accuracy by 20%, making it effective for outdoor activities in complex environments [15][16]. - The series introduces cycling power simulation, allowing users to monitor their cycling power in real-time without additional equipment [20][22]. - The new Dimensity perception system can recognize up to 12 emotions, providing a more personalized user experience [24]. - Health monitoring features include heart rate, sleep, and stress tracking, with a new atrial fibrillation load statistic function [26][27]. Group 2: HUAWEI FreeClip 2 Earphones - The FreeClip 2 earphones weigh only 5.1g per ear, making them extremely lightweight and suitable for all-day wear [34]. - The design has been optimized for stability, ensuring they stay in place during physical activities without discomfort [35]. - Equipped with a new self-developed audio chip and NPU AI processor, the earphones can automatically adjust volume based on the surrounding noise environment [37][38]. - The overall battery life reaches 38 hours, with 9 hours of single-ear use, and supports translation in 20 languages [41][42]. - The earphones feature an offline locating function, enhancing user convenience [43]. Group 3: HUAWEI Vision Smart Screen 5 Pro - The Vision Smart Screen 5 Pro starts at a price of 6499 yuan, featuring flagship-level picture quality and sound [44][45]. - The device has been slimmed down to a thickness of only 49mm, reducing the size by 23% compared to the previous generation [46]. Group 4: Overall Product Strategy - The product launch emphasizes practical upgrades that address everyday user issues such as battery life, device loss, and design aesthetics without relying on flashy marketing [48][49]. - Huawei's approach focuses on solving small problems through thoughtful design and functionality, enhancing the overall user experience [50].
LeCun团队开源首个代码世界模型:能生成代码还能自测自修!传统编程模型一夜成古典
量子位· 2025-09-25 01:06
Core Insights - Meta FAIR has launched the Code World Model (CWM), a 32 billion parameter language model designed for code generation and reasoning, marking the first systematic introduction of world modeling into code generation [1][2][4]. Group 1: Model Capabilities - CWM distinguishes itself by not only generating code but also understanding its execution, simulating variable state changes and environmental feedback, thus enhancing overall code comprehension and debugging capabilities [2][9]. - The model demonstrates performance close to GPT-4, achieving a score of 65.8% on the SWE-bench Verified benchmark, outperforming all open-source models of similar scale [4][31]. - CWM introduces the concept of code world modeling during training, allowing the model to learn how program states evolve during execution, transitioning from static text understanding to dynamic execution comprehension [15][26]. Group 2: Enhanced Features - CWM can simulate code execution line by line, predicting how each line affects variable states and identifying potential errors during execution, paving the way for a "neural debugger" [18][19]. - The model is capable of self-testing and self-correcting, automatically generating test cases after code generation and attempting multiple modification paths to fix errors, mimicking the human programming cycle of writing, testing, and revising [22][24]. - CWM exhibits reasoning and planning abilities, enabling it to analyze problem descriptions, plan function structures, and generate and validate code through iterative logical reasoning [25]. Group 3: Model Architecture and Training - CWM employs a 64-layer decoder-only Transformer architecture with a parameter count of 32 billion and supports a long context input of 131,072 tokens, significantly enhancing its ability to handle complex projects and multi-file code [26][27]. - The training process consists of three phases: pre-training with 8 trillion tokens, mid-training with 5 trillion tokens focused on world modeling, and a final stage involving 100 billion tokens for supervised fine-tuning and 172 billion tokens for multi-task reinforcement learning [38][47]. - The model's training utilized advanced techniques such as FlashAttention-3 and distributed environments, ensuring robust performance across various tasks [50][51]. Group 4: Future Directions and Limitations - Currently, CWM's world modeling data is limited to Python, with plans to explore multi-language support in the future, aiming to create a universal framework for automated programming assistance [53][54]. - CWM is primarily intended for research purposes and is not designed for dialogue tasks or chatbot applications, emphasizing its focus on code understanding and complex reasoning research [55][56].