量子位
Search documents
英伟达推出通用深度研究系统,可接入任何LLM,支持个人定制
量子位· 2025-09-08 05:04
Core Viewpoint - NVIDIA has introduced a Universal Deep Research (UDR) system that supports personalized customization and can interface with any large language model (LLM) [1][2]. Summary by Sections General Overview - The UDR system allows users to fully customize deep research strategies and delegate tasks to intelligent agents [2][10]. - A user interface prototype for UDR is available for download on GitHub, showcasing its versatility [3]. Features and Innovations - UDR enables users to create, edit, and optimize their customized deep research strategies without the need for additional training or fine-tuning [6]. - The system can compile strategies from natural language into executable research orchestration code, delivering final reports to users [11]. - Key innovative features include: - Customizable research strategies defined in natural language, which the system converts into executable code [12]. - A decoupled architecture that allows any LLM to be integrated into a complete deep research tool [13]. - Enhanced product design flexibility, enabling the use of advanced AI models alongside tailored research solutions [14]. User Interface and Control - The prototype showcases four practical functions: real-time strategy modification, preset strategy library selection, progress notifications, and report viewing [15]. - The interface includes a code agent for coordinating LLMs and tools, but lacks user control over resource prioritization and information verification [16]. Efficiency and Cost Management - UDR improves computational efficiency by separating control logic from LLM reasoning, with the entire research process managed by generated code running on the CPU [19]. - The system only calls the LLM when user-defined strategies require it, significantly reducing GPU resource consumption and overall execution costs [20]. Limitations and Future Improvements - The accuracy of UDR's execution of research strategies depends on the quality of the underlying AI model's code generation [21]. - The system assumes user-designed strategies are reasonable and executable, performing only basic checks [21]. - Current limitations include a lack of user intervention during execution and the need for all decisions to be pre-set, which reduces flexibility for long-term or exploratory research tasks [22]. - Proposed improvements include customizable strategy libraries and enhanced user control over LLM reasoning processes [23]. Current Status - The UDR system is still in the prototype phase and has not been officially launched, but there are expectations for a fully functional version in the future [25].
大模型破译甲骨文创下新SOTA!复旦团队推出新框架
量子位· 2025-09-08 05:04
Core Viewpoint - The article discusses a novel explainable framework for deciphering oracle bone script based on radical and pictographic analysis, achieving state-of-the-art (SOTA) accuracy in character recognition and zero-shot decoding capabilities [1][5][71]. Group 1: Framework and Methodology - The proposed method integrates radical recognition and pictographic semantic understanding to bridge the gap between the visual forms and meanings of oracle bone characters [5][71]. - A progressive training strategy is introduced, guiding the model from radical identification to pictographic analysis, culminating in a joint analysis phase [6][15]. - The framework employs a dual matching mechanism that enhances zero-shot decoding performance by selecting appropriate candidates from a dictionary based on analysis results [28][71]. Group 2: Dataset and Training - The research team created the PD-OBS dataset, which includes 47,157 Chinese characters annotated with oracle bone images and pictographic analysis texts, providing a valuable resource for future studies [9][73]. - The dataset comprises characters linked to oracle bone images, ancient script images, and modern script images, with annotations for radical and pictographic analysis [10][73]. Group 3: Experimental Results - The proposed method was evaluated against existing methods on the HUST-OBC and EV-OBC datasets, demonstrating superior performance in both validation and zero-shot settings [36][38]. - In zero-shot scenarios, the new method outperformed all other approaches, achieving a Top-10 accuracy improvement of 26.2% on the HUST-OBC dataset and 13.6% on the EV-OBC dataset [45][46]. - The explainability of the model's outputs was quantitatively assessed using BERT-Score, showing significant improvements over other large visual language models [47][49]. Group 4: Qualitative Analysis - The model exhibited strong recognition capabilities in the validation set and demonstrated good generalization in zero-shot settings, even for previously undeciphered characters [66][68]. - The dual analysis of radicals and pictographs provided a comprehensive visual-semantic mapping, enhancing the model's ability to generate semantically grounded and interpretable outputs [68][70].
开放全栈!超越π0,具身智能基础大模型迎来真·开源,开发者狂喜
量子位· 2025-09-08 05:04
Core Viewpoint - The article highlights the launch of WALL-OSS, an open-source embodied intelligence model in China, which surpasses previous models like π0 in various metrics [1][5][17]. Group 1: Model Features - WALL-OSS is a general-purpose embodied model with excellent generalization and reasoning capabilities, allowing for quick fine-tuning on proprietary systems [2]. - It is a multimodal model capable of processing and outputting data in various forms, including language, video, and actions, demonstrating strong causal reasoning and spatial understanding [3]. - With 4.2 billion parameters, WALL-OSS is the only open-source embodied model that provides end-to-end unified output across language, vision, and action [5][27]. Group 2: Team and Development - The development team, 自变量机器人, was established in late 2023 and has focused on end-to-end models, launching WALL-A, the largest unified embodied model globally [9]. - The team recently completed nearly 1 billion yuan in Series A+ financing, with major investors including Alibaba Cloud and Sequoia [13][14]. Group 3: Performance and Evaluation - WALL-OSS exhibits superior performance in both ID and OOD evaluations, maintaining high task success rates even in varied scenarios [17]. - It outperforms baseline models in long-term tasks requiring instruction breakdown and in reasoning tasks reliant on CoT [19][20]. - The model retains core functionalities of VLM while enhancing capabilities through multimodal benchmark tests [22]. Group 4: Technical Innovations - WALL-OSS addresses the "impossible triangle" of modality unification, action precision, and capability generalization through systematic architectural and training paradigm innovations [32]. - The model employs a novel architecture combining shared attention and expert flow mechanisms, allowing for effective information processing across modalities [34]. - It utilizes a two-stage training strategy to enhance spatial and semantic understanding while maintaining the original VLM capabilities [41][45]. Group 5: Open Source Strategy - WALL-OSS is fully open-sourced, providing a complete reproducible model solution, including pre-trained weights, training code, and deployment documentation [52][53]. - This initiative significantly lowers the entry barrier for developers, enabling rapid adaptation and deployment of advanced embodied intelligence [56]. - The open-source approach aims to foster industry growth by providing a robust foundational model that can be utilized across various applications [68].
OpenAI内部大重组!模型行为团队并入Post Training,负责人另起炉灶
量子位· 2025-09-08 05:04
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI OpenAI又要成立新团队了! TechCrunch消息,OpenAI正在对其模型行为 (Model behavior) 团队进行重组。 模型行为团队,就是OpenAI内部专门塑造模型"个性"的核心研究小组。迄今已参与GPT-4、GPT-4o、GPT-4.5与GPT-5等版本的后训练工 作。 随着重组的进行,其创始负责人 Joanne Jang ,也正在组建一个名为 OAI Labs 的新研究团队。 作为模型行为团队改组行动的一部分,OpenAI将会成立新的OAI Labs。 其使命是"发明并原型化人类与AI协作的全新交互方式",重点不再局限于聊天或Agent模式,而是把AI视作"具备思考、创作、学习、连接能力 的工具"。 目前实验室仍处于组建早期,但已明确的研究方向包括两条主线。 其一是 交互层面的探索 ,团队将围绕"互动工具"这一概念,设计可直接操控、可复用且能在迭代中实时反馈的多模态原型,帮助用户更直观 地塑造模型意图、在创作流程中随时调转方向,并以可视化方式呈现AI的多种推理路径与中间结果。 其二是 行为层面的延续研究 ,OAI Labs吸收了原 ...
光刻机巨头ASML,108亿控股了一家大模型公司
量子位· 2025-09-08 03:05
Core Viewpoint - ASML has become the largest shareholder of Mistral AI, investing €1.3 billion (approximately ¥10.8 billion) in a funding round that values Mistral AI at €10 billion (approximately ¥83.5 billion), making it the most valuable AI company in Europe [1][6]. Group 1: Investment Details - ASML led Mistral AI's C round financing, which totaled €1.7 billion (approximately ¥14.2 billion) [1]. - The investment allows ASML to secure a board seat in Mistral AI, indicating a deeper strategic partnership [3]. - The valuation of Mistral AI has skyrocketed from €2.4 million in June 2023 to €11.7 billion in the latest funding round, showcasing rapid growth [9]. Group 2: Mistral AI's Growth Journey - Mistral AI has experienced a meteoric rise, achieving a valuation of €100 billion in just over two years since its establishment [8]. - The company initially gained traction with open-source models like Mistral 7B and Mixtral 8x7B, which were well-received in the developer community [10][12]. - Mistral AI has diversified its product offerings, including a chat assistant named Le Chat, which competes with ChatGPT, and various tools for code generation and enhanced reasoning [13]. Group 3: Strategic Implications for ASML - This investment marks a significant step for ASML as it transitions from a pure hardware manufacturer to incorporating AI applications into its operations [16]. - The integration of Mistral's AI capabilities into ASML's lithography systems could enhance process precision and production efficiency [17]. - The collaboration mirrors successful industry examples, such as NVIDIA's partnership with TSMC, which has led to significant improvements in semiconductor manufacturing [19].
幸好图灵不是一位好棋手
量子位· 2025-09-07 07:00
Core Viewpoint - The article discusses the hypothetical scenario where if Alan Turing had been a master chess player, the trajectory of AI development might have been significantly different, emphasizing the importance of his collaboration with Donald Michie in shaping AI research [1][48]. Group 1: Turing's Chess Skills and Impact - Turing was known to play chess but was not particularly skilled, which led him to seek a more evenly matched opponent in Donald Michie [7][8][17]. - Turing and Michie's friendship blossomed through their chess games, which often included discussions on "learning machines" and "mechanizing chess," influencing their future work in AI [20][22]. Group 2: Development of AI Algorithms - Michie developed a paper-based chess algorithm called MACHIAVELLI, which utilized a "look one step ahead" strategy, similar to Turing's Bombe machine approach [23][26]. - The concept of heuristic search, which emerged from their discussions, became a foundational method in AI for solving complex problems [33][34]. Group 3: Chess as a Tool for AI Research - Michie believed that studying chess was crucial for AI research, as it provided a structured environment to explore cognitive functions and decision-making processes [42][43]. - His work on chess endgames significantly influenced AI projects in the 1970s and 1980s, demonstrating the relevance of chess in advancing machine intelligence [44]. Group 4: Legacy and Modern Perspectives - The article concludes by reflecting on how Turing's lack of chess mastery may have inadvertently contributed to the development of AI, highlighting the broader implications of chess in understanding machine intelligence [48][49]. - The ongoing discourse around AGI (Artificial General Intelligence) suggests a complex relationship between chess proficiency and logical reasoning, indicating that high chess skill does not necessarily correlate with excellence in other domains [51][52].
3999让机器人家务全包,抱抱脸联合创始人:开源YYDS!
量子位· 2025-09-07 04:36
Core Viewpoint - The article discusses the launch of the XLeRobot, an open-source DIY robot project initiated by Chinese researcher Wang Gaotian, which is priced at only 3999 yuan, making it an affordable option for home use and DIY enthusiasts [8][12]. Summary by Sections Product Overview - XLeRobot is a versatile home robot capable of performing various tasks such as cleaning, watering plants, and playing with pets [2][4][6]. - The project has gained attention and recommendations from notable figures, including Thomas Wolf, co-founder of Hugging Face [9]. Cost and Components - The base cost of the robot is 3999 yuan in China, significantly lower than similar products in the US and EU, which are priced around $660 and €680 respectively [13]. - The robot's affordability is attributed to the ability to customize components and use cheaper alternatives [12]. - Key components include an open-source low-cost robotic arm, RGB cameras, Raspberry Pi, and other easily sourced parts [13][16]. Assembly and Usability - The estimated assembly time for the robot is around 4 hours, comparable to building with LEGO, making it accessible for DIY enthusiasts [17]. - The project provides comprehensive tutorials for setup and operation, enhancing user experience [22][24]. Community and Open Source - The project has sparked significant interest in the open-source community, achieving 1.6k stars on GitHub shortly after its release [30]. - Users express eagerness to experiment with the robot, highlighting the benefits of open-source innovation and cost savings [30]. Future Developments - Future upgrades for XLeRobot are expected to be modular, allowing users to enhance their robots with additional components [33]. - The project aims to provide a practical platform for those interested in robotics and embodied AI, while also serving as a testing ground for Wang Gaotian's research [41]. Team Background - Wang Gaotian, the project's initiator, has a strong academic background in robotics and has collaborated with Boston Dynamics on significant research [38]. - The team includes contributors responsible for various aspects of the project, such as reinforcement learning deployment and documentation [42][43].
深度长文AI一键生成:实测字节扣子空间新功能
量子位· 2025-09-07 04:36
Core Viewpoint - The article emphasizes the importance of using AI tools, specifically "扣子空间," to enhance writing capabilities and facilitate deep writing, which combines systematic thinking and efficient expression [1][2][3]. Group 1: AI Writing Tools Overview - "扣子空间" is described as a "deep long-form writing accelerator" that aids creators in producing in-depth content [2]. - Deep writing is defined as a combination of systematic thinking and efficient expression, rather than mere word accumulation [3]. - The article aims to evaluate the effectiveness of "扣子空间" in terms of breadth, depth, and precision [4]. Group 2: Features and Functionality - One of the standout features of "扣子空间" is the one-click optimization of prompts, which simplifies the writing process [5]. - Users can modify existing templates if the generated results do not meet their expectations [6]. - The tool allows for the generation of academic-style articles, showcasing its capability to produce impressive results [7]. Group 3: Research and Data Access - The tool provides access to various research papers and case studies, enhancing the depth of content creation [9][10]. - Users can search for specific topics and access a range of results, indicating the tool's robust research capabilities [11]. - The inclusion of clickable references and links is highlighted as a significant advantage, ensuring the credibility of the generated content [9][14]. Group 4: Content Creation Examples - The article presents examples of how "扣子空间" can be used to analyze social topics and generate engaging content, demonstrating its versatility [19][20]. - The tool can produce travel diaries and other narrative forms, although it may not fully capture the essence of storytelling [21][24]. - While it excels in generating structured content quickly, it is suggested that it may not be suitable for creative story writing [33][34]. Group 5: User Experience and Limitations - Users have reported that "扣子空间" is particularly effective for content-focused self-media, allowing for quick data collection and organization [32]. - The tool's limitations in creative writing are acknowledged, as it may produce results that resemble standard articles rather than imaginative narratives [36][37]. - The article concludes that while AI tools can assist in the writing process, they cannot replace genuine creativity [38].
大模型破译甲骨文创下新SOTA!复旦团队推出新框架
量子位· 2025-09-07 04:36
Core Viewpoint - The article presents a novel explainable framework for deciphering oracle bone script based on radical and pictographic analysis, achieving state-of-the-art (SOTA) accuracy in character recognition and demonstrating strong zero-shot capabilities [1][5][71]. Group 1: Methodology and Framework - The proposed method integrates radical recognition and pictographic semantic understanding to bridge the gap between the visual forms and meanings of oracle bone characters [5][71]. - A progressive training strategy is introduced, guiding the model from radical identification to pictographic analysis, culminating in a joint analysis to enhance the deciphering process [6][15][22]. - The framework employs a dual matching mechanism that selects appropriate candidates from a dictionary based on analysis results, improving zero-shot performance [28][71]. Group 2: Dataset and Training - The research team developed the PD-OBS dataset, which includes 47,157 Chinese characters annotated with oracle bone images and pictographic analysis texts, providing a valuable resource for future studies [9][73]. - The dataset comprises characters linked to oracle bone images, ancient script images, and modern standard script images, with annotations for radical and pictographic analysis [10][73]. Group 3: Experimental Results - The new method was evaluated against existing methods on the HUST-OBC and EV-OBC datasets, showing competitive Top-1 and Top-10 accuracy rates, particularly excelling in zero-shot scenarios [38][45]. - In zero-shot settings, the proposed method outperformed all other methods, achieving a Top-10 accuracy improvement of 26.2% on the HUST-OBC dataset and 13.6% on the EV-OBC dataset [45][46]. - The explainability of the model's outputs was quantitatively assessed using BERT-Score, demonstrating higher reliability compared to other large visual language models [47][50]. Group 4: Qualitative Analysis - The model exhibited strong recognition capabilities in both validation and zero-shot settings, generating semantically reasonable predictions for characters that have not been deciphered by human experts [66][68]. - The dual analysis of radicals and pictographs provided a comprehensive visual-semantic mapping, enhancing the model's ability to produce interpretable outputs even for undeciphered characters [68][70].
拜拜Claude!阿里最强万亿模型编程秒了Opus4,实测在此
量子位· 2025-09-06 04:21
Core Viewpoint - Alibaba has launched its largest model to date, Qwen3-Max-Preview, with a total parameter count of 1 trillion, significantly enhancing its capabilities compared to the previous version, Qwen3, which had 235 billion parameters [1][2]. Summary by Sections Model Specifications - Qwen3-Max-Preview boasts a parameter count that is nearly four times larger than its predecessor, Qwen3 [2]. - The new version shows significant improvements in understanding Chinese and English, following complex instructions, and tool utilization, while also reducing knowledge hallucinations [2]. Availability and Performance - The model is fully launched and can be accessed through the Tongyi APP, Qwen Chat web interface, and Alibaba Cloud API [3]. - Initial evaluations indicate that Qwen3-Max-Preview outperforms not only its predecessor but also competitors like Claude Opus 4 [4][6]. User Experience and Testing - Users have reported a strong desire to test the model due to its impressive scale [6]. - The model has demonstrated its problem-solving abilities by successfully answering an AIME math competition question and generating code for interactive animations and games like Minesweeper [13][15][19]. Technical Details - Qwen3-Max-Preview supports multi-modal inputs, allowing users to upload images directly [12]. - The model's code generation speed is noted to be efficient, producing results quickly and accurately [23][26]. Pricing and API Information - The API pricing is structured based on input token counts, with specific rates for different ranges of tokens [27]. - The model supports a context length of 262,144 tokens, with maximum input and output limits specified [28]. Future Developments - Although the model has not been officially open-sourced, there are expectations for future versions, including a reasoning version that may be released soon [30][34]. - The lead of the open-source team has expressed confidence in the model's capabilities and future expansions [31][32].