Workflow
量子位
icon
Search documents
暴走东京电玩展,Game Show也AI上了
量子位· 2025-09-27 07:00
Core Viewpoint - The article highlights the significant presence and influence of Chinese companies at the Tokyo Game Show (TGS), showcasing advancements in AI technology and its integration into the gaming industry [1][36]. Group 1: Chinese Companies at TGS - Major Chinese gaming companies such as NetEase, Tencent, and others have established impressive exhibition spaces, attracting numerous players [2][8]. - AI companies are also making their mark at TGS, demonstrating their capabilities and innovations in the gaming sector [8][10]. Group 2: AI Technology Showcase - Alibaba's booth prominently featured its open-source models, including Tongyi Qianwen and Tongyi Wanxiang, offering a range of commercial solutions from IaaS to SaaS [11][12]. - The Model Studio platform and AI development platform PAI were highlighted as part of Alibaba's offerings, indicating a strong push for AI integration in gaming [13][15]. Group 3: 3D Generation Technology - Tencent Cloud emphasized its cloud computing capabilities for game security and operations, while also discussing the potential of mixed reality 3D technology [21][22]. - VAST's Tripo, a leading open-source 3D generation project, is gaining attention from game developers both domestically and internationally [26][27]. Group 4: AI Applications in Gaming - HakkoAI, an AI gaming companion, showcased its ability to understand and interact with various games, outperforming several top general models in specific gaming scenarios [34]. - The integration of AI in gaming is creating new possibilities and enhancing player experiences, indicating a growing trend in the industry [36].
让RAG真正读懂“言外之意”!新框架引入词汇多样性,刷新多项基准SOTA
量子位· 2025-09-27 07:00
Core Insights - The article discusses the introduction of the Lexical Diversity-aware RAG (DRAG) framework, which enhances the accuracy of Retrieval-Augmented Generation (RAG) models by 10.6% and sets new state-of-the-art (SOTA) results in multiple benchmarks [1][2][16]. Group 1: Framework and Innovations - The DRAG framework systematically incorporates lexical diversity into the retrieval and generation processes of RAG, providing a lightweight, general, and easily extensible solution [1][5]. - The research team from Beihang University, Peking University, and Zhongguancun Laboratory highlights the importance of lexical diversity, which has been largely overlooked in existing RAG methods [4][5]. - Two key innovations are introduced: 1. Diversity-sensitive Relevance Analyzer (DRA), which dissects query semantics and employs differentiated strategies for various components, leading to a more granular relevance scoring [9]. 2. Risk-guided Sparse Calibration (RSC), which monitors the "misleading risk" of each generated token and calibrates decoding as necessary, ensuring the generation phase is not disturbed by irrelevant information [11][14]. Group 2: Performance and Results - The DRAG framework has shown significant performance improvements across various open-domain question-answering benchmarks, with notable accuracy increases in PopQA and TriviaQA by 4.9% and 4.4%, respectively, and a 10.6% increase in HotpotQA and 2WikiMultiHopQA [16]. - The method also outperforms existing models in long-answer generation metrics such as str-em and QA-F1, demonstrating strong generalization capabilities across different model sizes, including Llama2-7B and Llama2-13B [18][16]. Group 3: Lexical Diversity Challenges - The article identifies lexical diversity as a critical yet often neglected issue in RAG methods, where different expressions of the same question can confuse retrieval models, leading to incorrect answers [5][8]. - The framework addresses this by allowing semantic flexibility for variable components while ensuring strict matching for invariant components, thus improving the relevance of retrieved documents [12]. Group 4: Future Directions - The research team plans to expand the application of the DRAG framework to more specialized scenarios, aiming to enhance the understanding of complex human language expressions in large models [5].
翁荔陈丹琦加盟的840亿AI公司,公开第二篇论文
量子位· 2025-09-27 04:46
Core Viewpoint - The article discusses the recent research paper by Thinking Machines, led by Jeremy Bernstein, focusing on "Modular Manifolds" to enhance the stability and efficiency of neural network training through a unified framework for different layers/modules [1][2]. Group 1: Research Motivation and Challenges - The research aims to address fundamental challenges in neural network training, particularly issues related to tensor values (weights, activations, gradients) that can lead to instability, gradient explosion/vanishing, and low training efficiency [2]. - The author proposes a new optimization approach called Modular Manifolds, which applies constraints not only to individual weight tensors but also views the entire network as a composite manifold structure [2][8]. Group 2: Importance of Manifold Constraints - The necessity for manifold constraints arises from the instability encountered during the training of large models, where extreme values of weights, activations, or gradients can lead to issues like overflow, disappearance, and slow convergence [8]. - Normalization methods have been the gold standard for addressing these issues, but there has been little focus on normalizing the weight matrices themselves [8][9]. Group 3: Benefits of Weight Normalization - Normalizing weight matrices can lead to more stable training, easier adjustments, predictable behavior, and greater resistance to external disturbances [9][10]. Group 4: Research Process Overview - The research process includes several steps, starting with a basic example of training a parameter vector constrained to a unit sphere [11]. - The author discusses the challenges of using standard optimization methods like Adam or SGD, which may lead to updates that exit the constraint space [12][13]. Group 5: Manifold Optimization Techniques - The manifold optimization approach involves projecting gradients onto the tangent space, updating parameters, and then retracting the updated vector back onto the manifold [14]. - The choice of manifold constraints and measurement of lengths can lead to the creation of various optimization algorithms [16]. Group 6: Extension to Matrix Parameters - The research extends the concept from vector parameters to matrix parameters, particularly for the weight matrices in Transformers, which can have thousands of dimensions [17]. - The Stiefel manifold is proposed for matrix parameters, ensuring orthogonality of column vectors and a condition number of 1, which aids in numerical stability [18][20]. Group 7: Experimental Validation - A small-scale experiment was conducted on the CIFAR-10 dataset, comparing the manifold Muon algorithm with AdamW, showing that the former slightly outperformed the latter in training/testing accuracy, although it was slower in execution time [23][24]. Group 8: Modular Manifolds Concept - The concept of Modular Manifolds is introduced, treating each layer or module of the neural network as a separate manifold with its own defined norms and optimization methods [26][27]. - These individual manifolds can be combined into a larger manifold space, where a global mechanism constrains the overall update process while allowing local updates [29][30]. Group 9: Future Implications - The proposed methodology emphasizes the design coupling of the entire model training process, suggesting that successful application on large Transformers or LLMs could significantly enhance training efficiency and stability [31][32]. - The company has already achieved a valuation exceeding $12 billion, indicating strong market expectations for its research outcomes [52].
业界首个高质量原生3D组件生成模型来了!来自腾讯混元团队
量子位· 2025-09-27 04:46
腾讯混元3D团队 投稿 量子位 | 公众号 QbitAI 业界首个高质量原生3D组件生成模型来了! 来自腾讯混元3D团队。 现有的3D生成算法通常会生成一体化的3D模型, 而下游应用通常需要语义可分解的3D形状 ,即3D物体的每一个组件需要单独地生成出来。 类似以下视频所演示的那样: 一般来说, 组件式3D生成 主要有2个应用场景: 1) 视频游戏制作管线 : 在游戏中, 很多资产是要根据语意信息将其绑定不同的游戏逻辑, 比如,汽车模型应该能够被分解为主体和四个 可滚动的轮子, 这样轮子是可以单独滚动起来的。所以组件拆分很重要。 与此同时,3D几何生成的下游链路,包括低模拓扑,UV展开等模块。这些模块处理很复杂的几何会变得困难,通过将复杂几何进行拆分简单 的小组件,这种分而治之的策略,可以大大降低下游算法的处理难度。 2)3D打印 : 这对3D打印行业也是不错的消息, 用户可以把组件一个一个打印出来然后再组装,像搭积木一样。 最后,由 X-Part 将整体形状分解为各个部件。 △ 图1. Hunyuan3D-Part组件拆分整体流程 其技术亮点在于: 1)提出了业界首个原生3D分割模型P3-SAM, 利用大规 ...
首款推理具身模型,谷歌DeepMind造!自主理解/规划/执行复杂任务,打破一机一训,还能互相0样本迁移技能
量子位· 2025-09-27 04:46
Core Viewpoint - Google DeepMind has launched the Gemini Robotics 1.5 series, marking a significant milestone in the development of general AI for real-world applications, featuring embodied reasoning capabilities that allow robots to "think before acting" [1][9]. Group 1: Model Composition - The Gemini Robotics 1.5 series consists of two main models: GR 1.5 for action execution and GR-ER 1.5 for embodied reasoning [2][8]. - GR-ER 1.5 is the world's first embodied model with simulated reasoning capabilities [3]. Group 2: Functional Capabilities - The combination of GR-ER 1.5 and GR 1.5 enables robots to perform complex multi-step tasks, such as sorting clothes by color or packing luggage based on weather conditions [5][6]. - GR 1.5 can adapt to various robot hardware, allowing a single model to operate across different platforms without the need for separate training [16][18]. Group 3: Motion Transfer Mechanism - The innovative "Motion Transfer" mechanism allows skills learned on one robot to be transferred to another, enhancing cross-platform functionality [21][48]. - This mechanism abstracts different robot actions into a unified semantic space, enabling seamless skill sharing across diverse hardware [56]. Group 4: Safety and Explainability - The GR 1.5 series enhances safety by allowing robots to self-correct during tasks and recognize potential risks, ensuring safe operation in human environments [34][36]. - The embodied reasoning model provides transparency in the robot's decision-making process, improving interpretability and trust [55][58]. Group 5: Performance Metrics - In benchmark tests, GR 1.5 outperformed previous models in various dimensions, including instruction generalization and task completion rates, achieving nearly 80% in long-sequence tasks [61][62]. - The model demonstrated unprecedented zero-shot transfer capabilities in cross-robot migration tests [63]. Group 6: Future Developments - The GR 1.5 series represents a shift from executing single commands to genuinely understanding and solving physical tasks [69]. - Currently, developers can access GR-ER 1.5 through Google AI Studio, while GR 1.5 is available to select partners [71].
大模型“精细化”对齐,真实性提升25.8%刷新SOTA!token级精准编辑,无需训练即插即用
量子位· 2025-09-27 04:46
Core Insights - The article discusses a new method called Token-Aware Editing (TAE) that enhances the alignment capabilities of large language models (LLMs), achieving a 25.8% improvement in truthfulness metrics on the TruthfulQA task, setting a new performance benchmark [1][15]. Group 1: Methodology - TAE is a token-aware reasoning representation editing method that addresses the limitations of traditional representation editing techniques, requiring no training and being plug-and-play applicable across various scenarios such as dialogue systems and content moderation [1][3]. - Existing methods often overlook the misalignment differences between tokens, leading to biased alignment directions and inflexible editing strengths [4][6]. - TAE consists of two main modules: Mutual Information-guided Graph Aggregation (MIG) and Misalignment-aware Adaptive Intervention (MAI) [8][10]. Group 2: Module Details - MIG enhances the representation capability of activation values to find more accurate editing directions by addressing information loss and local understanding limitations inherent in traditional methods [10]. - MAI calculates adaptive editing strengths for each token based on its misalignment risk, allowing for differentiated intervention levels that prevent over-correction of safe tokens and under-correction of dangerous tokens [11][12]. Group 3: Experimental Results - TAE significantly outperformed existing methods in various metrics, achieving a True*Info score of 87.8% on the TruthfulQA dataset, surpassing the previous best method (SEA) by 14.6 percentage points and the original baseline by 25.8 percentage points [14][15]. - In toxicity reduction tasks, TAE reduced the toxicity probability from a baseline of 0.41 to 0.05, a nearly 90% decrease, outperforming all specialized de-toxification baseline methods [16]. - TAE also demonstrated substantial improvements in fairness tasks, lowering stereotype scores from a baseline of 64.8% to 50.3%, approaching the ideal unbiased state [16]. Group 4: Broader Implications - The TAE method shows significant gains across various model types and sizes, including Llama2-7B-Chat, Llama2-13B-Chat, Alpaca-7B, and Mistral-7B, indicating its versatility and effectiveness in enhancing model alignment [17].
实测Kimi全新Agent模型「OK Computer」,很OK
量子位· 2025-09-27 01:30
Core Viewpoint - Kimi has launched a new Agent model named OK Computer, which showcases advanced capabilities in web development, data processing, and content generation [1][4][6]. Group 1: Design Tasks - The new Agent can create a Pygame-themed webpage autonomously, including sections on the history of Pygame, game showcases, core features, and development tutorials, demonstrating its ability to design and implement content independently [9][10][12]. - The model generates a Todo List to track progress on tasks, marking completed items and allowing users to monitor the workflow [16]. - It can autonomously conduct web searches and generate materials needed for webpage creation, showcasing its self-sufficiency in the design process [17]. Group 2: Generation Tasks - The Agent was tasked with creating a children's story and visualizing it as a picture book, which included story writing, image generation, and audio production, highlighting its multi-modal content creation capabilities [20][21]. - Additionally, it successfully produced an editable PowerPoint presentation on China's top ten original musicals, demonstrating its proficiency in generating presentation materials [22][24][26]. Group 3: Analysis Tasks - The Agent can handle data analysis tasks by searching for financial data and visualizing it, thus alleviating the burden of data collection and analysis from users [29][30]. - It can also analyze lengthy Excel documents and present the data in a clear and understandable manner, indicating its effectiveness in managing complex data sets [31][32].
首个开源实现100%可复现的稳定RL训练框架来了!2次结果完全重合
量子位· 2025-09-27 01:30
Core Insights - The article discusses the achievement of SGLang and slime teams in creating a fully reproducible and stable reinforcement learning (RL) training framework based on the Qwen3-8B model, addressing the issue of non-deterministic outputs in large language model (LLM) inference [1][2][6]. Group 1: Deterministic Inference - SGLang and slime teams have developed a deterministic inference solution that integrates batch invariant operators, CUDA Graph, radix cache, and chunked prefill, ensuring high performance while maintaining compatibility with key features [5][8]. - The implementation of batch invariant operators addresses the core issue of output uncertainty in LLM inference, which arises from varying batch sizes during dynamic batching [7][8]. - Testing has shown that the average performance drop for SGLang's solution is 34.35%, significantly better than the 61.5% decline reported by Thinking Machines Lab [5][12]. Group 2: Performance Metrics - The article presents performance metrics for different inference modes, showing that deterministic modes yield consistent outputs across various batch sizes, with unique output counts significantly reduced [10][11]. - In terms of end-to-end latency, deterministic inference shows a performance drop of 25% to 45%, with specific backend performance metrics indicating improvements in certain configurations [12][13]. Group 3: Future Developments - Future efforts will focus on optimizing batch invariant operators to enhance performance, particularly for RL inference, and expanding support to mixture of experts (MoE) models [16][18]. - The team aims to improve radix cache functionality and explore tensor parallelism to further enhance the capabilities of deterministic inference [18].
高通组局,宇树王兴兴说了一堆大实话
量子位· 2025-09-26 09:12
Core Viewpoint - The article discusses the challenges and opportunities in the field of embodied intelligence and robotics, emphasizing the importance of collaboration among industry players to address technical difficulties and accelerate progress [3][25][48]. Group 1: Industry Challenges - The current state of robotics is characterized by diverse technical routes, leading to a lack of significant progress despite the apparent excitement in the field [4][25]. - Many robotics and chip manufacturers overlook the critical role of chips in robotics, which is essential for enhancing performance and reliability [16][18]. - The industry faces difficulties in deploying large-scale computing power in robots due to space constraints, battery capacity, and heat dissipation issues [20][21]. Group 2: Technological Developments - The goal of companies like Yushu Technology is to develop universal AI for robots that can perform various tasks in unfamiliar environments, akin to a "ChatGPT moment" for robotics [11][12]. - The development stages for achieving advanced robotic capabilities include fixed action demonstrations, real-time action generation, task execution in unfamiliar settings, and achieving high success rates in delicate operations [12]. - The future of embodied intelligence in robotics may involve using mobile phone chips, which could provide significant potential for innovation [24]. Group 3: Collaboration and Open Source - The article highlights the importance of open-sourcing models to foster collaboration and accelerate advancements in the field, similar to OpenAI's approach with earlier GPT models [28][29]. - Companies are encouraged to maintain an open attitude towards various models and collaborate with third parties to enhance development [30][31]. Group 4: AI and Agent Systems - The article discusses the role of agent systems in AI, emphasizing the need for end-cloud collaboration to improve user experience and privacy [35][36]. - The demand for end-side models is increasing, as they are crucial for understanding user needs and facilitating communication with cloud models [39][40]. - The industry lacks a unified standard for AI applications across different devices, leading to high development costs and fragmentation [48][50]. Group 5: Future Directions - The future of AI in robotics and other sectors will likely involve creating a cross-terminal operating system that integrates various services and enhances user experience [50][51]. - Collaboration among industry players is essential for building the necessary infrastructure and supporting innovation in smart devices [51].
Gemini灵魂人物加盟xAI,马斯克亲自夹道欢迎!
量子位· 2025-09-26 09:12
Core Viewpoint - Dustin Tran, a former senior researcher at Google DeepMind, has joined xAI and is recognized for his significant contributions to the development of the Gemini AI model, which has achieved state-of-the-art reasoning capabilities and won multiple prestigious competitions [1][2][12]. Group 1: Dustin Tran's Contributions - Tran played a pivotal role in the development of the Gemini product line, which helped Google regain its position in the AI landscape after the decline of GPT [2][12]. - Under Tran's leadership, the Gemini series, particularly Gemini 1.5 Pro, excelled in various AI benchmarks, marking a significant turnaround for Google [15][16]. - Tran's team was instrumental in the rapid development of Gemini's predecessor, Bard, despite its initial poor reception [13][14]. Group 2: Transition to xAI - Tran's decision to join xAI was influenced by three main factors: superior computing power, innovative data strategies, and alignment with Elon Musk's corporate philosophy [27][28][29]. - He expressed admiration for the extensive resources available at xAI, which he found unparalleled even during his tenure at Google [30][31]. - Tran believes that xAI has the potential to achieve rapid advancements in AI capabilities, surpassing other companies in a short timeframe [35][36]. Group 3: Background and Achievements - Tran has an impressive academic background, having graduated from UC Berkeley, earned a master's degree from Harvard, and pursued a PhD at Columbia University [22]. - He has contributed to several influential projects and publications in the AI field, with over 24,000 citations on Google Scholar [25][23]. - His early career included a brief internship at OpenAI, where he was involved in notable projects like the Dota 2 AI [21][19].