Workflow
量子位
icon
Search documents
OpenAI内部大重组!模型行为团队并入Post Training,负责人另起炉灶
量子位· 2025-09-08 05:04
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI OpenAI又要成立新团队了! TechCrunch消息,OpenAI正在对其模型行为 (Model behavior) 团队进行重组。 模型行为团队,就是OpenAI内部专门塑造模型"个性"的核心研究小组。迄今已参与GPT-4、GPT-4o、GPT-4.5与GPT-5等版本的后训练工 作。 随着重组的进行,其创始负责人 Joanne Jang ,也正在组建一个名为 OAI Labs 的新研究团队。 作为模型行为团队改组行动的一部分,OpenAI将会成立新的OAI Labs。 其使命是"发明并原型化人类与AI协作的全新交互方式",重点不再局限于聊天或Agent模式,而是把AI视作"具备思考、创作、学习、连接能力 的工具"。 目前实验室仍处于组建早期,但已明确的研究方向包括两条主线。 其一是 交互层面的探索 ,团队将围绕"互动工具"这一概念,设计可直接操控、可复用且能在迭代中实时反馈的多模态原型,帮助用户更直观 地塑造模型意图、在创作流程中随时调转方向,并以可视化方式呈现AI的多种推理路径与中间结果。 其二是 行为层面的延续研究 ,OAI Labs吸收了原 ...
光刻机巨头ASML,108亿控股了一家大模型公司
量子位· 2025-09-08 03:05
Core Viewpoint - ASML has become the largest shareholder of Mistral AI, investing €1.3 billion (approximately ¥10.8 billion) in a funding round that values Mistral AI at €10 billion (approximately ¥83.5 billion), making it the most valuable AI company in Europe [1][6]. Group 1: Investment Details - ASML led Mistral AI's C round financing, which totaled €1.7 billion (approximately ¥14.2 billion) [1]. - The investment allows ASML to secure a board seat in Mistral AI, indicating a deeper strategic partnership [3]. - The valuation of Mistral AI has skyrocketed from €2.4 million in June 2023 to €11.7 billion in the latest funding round, showcasing rapid growth [9]. Group 2: Mistral AI's Growth Journey - Mistral AI has experienced a meteoric rise, achieving a valuation of €100 billion in just over two years since its establishment [8]. - The company initially gained traction with open-source models like Mistral 7B and Mixtral 8x7B, which were well-received in the developer community [10][12]. - Mistral AI has diversified its product offerings, including a chat assistant named Le Chat, which competes with ChatGPT, and various tools for code generation and enhanced reasoning [13]. Group 3: Strategic Implications for ASML - This investment marks a significant step for ASML as it transitions from a pure hardware manufacturer to incorporating AI applications into its operations [16]. - The integration of Mistral's AI capabilities into ASML's lithography systems could enhance process precision and production efficiency [17]. - The collaboration mirrors successful industry examples, such as NVIDIA's partnership with TSMC, which has led to significant improvements in semiconductor manufacturing [19].
幸好图灵不是一位好棋手
量子位· 2025-09-07 07:00
Core Viewpoint - The article discusses the hypothetical scenario where if Alan Turing had been a master chess player, the trajectory of AI development might have been significantly different, emphasizing the importance of his collaboration with Donald Michie in shaping AI research [1][48]. Group 1: Turing's Chess Skills and Impact - Turing was known to play chess but was not particularly skilled, which led him to seek a more evenly matched opponent in Donald Michie [7][8][17]. - Turing and Michie's friendship blossomed through their chess games, which often included discussions on "learning machines" and "mechanizing chess," influencing their future work in AI [20][22]. Group 2: Development of AI Algorithms - Michie developed a paper-based chess algorithm called MACHIAVELLI, which utilized a "look one step ahead" strategy, similar to Turing's Bombe machine approach [23][26]. - The concept of heuristic search, which emerged from their discussions, became a foundational method in AI for solving complex problems [33][34]. Group 3: Chess as a Tool for AI Research - Michie believed that studying chess was crucial for AI research, as it provided a structured environment to explore cognitive functions and decision-making processes [42][43]. - His work on chess endgames significantly influenced AI projects in the 1970s and 1980s, demonstrating the relevance of chess in advancing machine intelligence [44]. Group 4: Legacy and Modern Perspectives - The article concludes by reflecting on how Turing's lack of chess mastery may have inadvertently contributed to the development of AI, highlighting the broader implications of chess in understanding machine intelligence [48][49]. - The ongoing discourse around AGI (Artificial General Intelligence) suggests a complex relationship between chess proficiency and logical reasoning, indicating that high chess skill does not necessarily correlate with excellence in other domains [51][52].
3999让机器人家务全包,抱抱脸联合创始人:开源YYDS!
量子位· 2025-09-07 04:36
Core Viewpoint - The article discusses the launch of the XLeRobot, an open-source DIY robot project initiated by Chinese researcher Wang Gaotian, which is priced at only 3999 yuan, making it an affordable option for home use and DIY enthusiasts [8][12]. Summary by Sections Product Overview - XLeRobot is a versatile home robot capable of performing various tasks such as cleaning, watering plants, and playing with pets [2][4][6]. - The project has gained attention and recommendations from notable figures, including Thomas Wolf, co-founder of Hugging Face [9]. Cost and Components - The base cost of the robot is 3999 yuan in China, significantly lower than similar products in the US and EU, which are priced around $660 and €680 respectively [13]. - The robot's affordability is attributed to the ability to customize components and use cheaper alternatives [12]. - Key components include an open-source low-cost robotic arm, RGB cameras, Raspberry Pi, and other easily sourced parts [13][16]. Assembly and Usability - The estimated assembly time for the robot is around 4 hours, comparable to building with LEGO, making it accessible for DIY enthusiasts [17]. - The project provides comprehensive tutorials for setup and operation, enhancing user experience [22][24]. Community and Open Source - The project has sparked significant interest in the open-source community, achieving 1.6k stars on GitHub shortly after its release [30]. - Users express eagerness to experiment with the robot, highlighting the benefits of open-source innovation and cost savings [30]. Future Developments - Future upgrades for XLeRobot are expected to be modular, allowing users to enhance their robots with additional components [33]. - The project aims to provide a practical platform for those interested in robotics and embodied AI, while also serving as a testing ground for Wang Gaotian's research [41]. Team Background - Wang Gaotian, the project's initiator, has a strong academic background in robotics and has collaborated with Boston Dynamics on significant research [38]. - The team includes contributors responsible for various aspects of the project, such as reinforcement learning deployment and documentation [42][43].
深度长文AI一键生成:实测字节扣子空间新功能
量子位· 2025-09-07 04:36
Core Viewpoint - The article emphasizes the importance of using AI tools, specifically "扣子空间," to enhance writing capabilities and facilitate deep writing, which combines systematic thinking and efficient expression [1][2][3]. Group 1: AI Writing Tools Overview - "扣子空间" is described as a "deep long-form writing accelerator" that aids creators in producing in-depth content [2]. - Deep writing is defined as a combination of systematic thinking and efficient expression, rather than mere word accumulation [3]. - The article aims to evaluate the effectiveness of "扣子空间" in terms of breadth, depth, and precision [4]. Group 2: Features and Functionality - One of the standout features of "扣子空间" is the one-click optimization of prompts, which simplifies the writing process [5]. - Users can modify existing templates if the generated results do not meet their expectations [6]. - The tool allows for the generation of academic-style articles, showcasing its capability to produce impressive results [7]. Group 3: Research and Data Access - The tool provides access to various research papers and case studies, enhancing the depth of content creation [9][10]. - Users can search for specific topics and access a range of results, indicating the tool's robust research capabilities [11]. - The inclusion of clickable references and links is highlighted as a significant advantage, ensuring the credibility of the generated content [9][14]. Group 4: Content Creation Examples - The article presents examples of how "扣子空间" can be used to analyze social topics and generate engaging content, demonstrating its versatility [19][20]. - The tool can produce travel diaries and other narrative forms, although it may not fully capture the essence of storytelling [21][24]. - While it excels in generating structured content quickly, it is suggested that it may not be suitable for creative story writing [33][34]. Group 5: User Experience and Limitations - Users have reported that "扣子空间" is particularly effective for content-focused self-media, allowing for quick data collection and organization [32]. - The tool's limitations in creative writing are acknowledged, as it may produce results that resemble standard articles rather than imaginative narratives [36][37]. - The article concludes that while AI tools can assist in the writing process, they cannot replace genuine creativity [38].
大模型破译甲骨文创下新SOTA!复旦团队推出新框架
量子位· 2025-09-07 04:36
Core Viewpoint - The article presents a novel explainable framework for deciphering oracle bone script based on radical and pictographic analysis, achieving state-of-the-art (SOTA) accuracy in character recognition and demonstrating strong zero-shot capabilities [1][5][71]. Group 1: Methodology and Framework - The proposed method integrates radical recognition and pictographic semantic understanding to bridge the gap between the visual forms and meanings of oracle bone characters [5][71]. - A progressive training strategy is introduced, guiding the model from radical identification to pictographic analysis, culminating in a joint analysis to enhance the deciphering process [6][15][22]. - The framework employs a dual matching mechanism that selects appropriate candidates from a dictionary based on analysis results, improving zero-shot performance [28][71]. Group 2: Dataset and Training - The research team developed the PD-OBS dataset, which includes 47,157 Chinese characters annotated with oracle bone images and pictographic analysis texts, providing a valuable resource for future studies [9][73]. - The dataset comprises characters linked to oracle bone images, ancient script images, and modern standard script images, with annotations for radical and pictographic analysis [10][73]. Group 3: Experimental Results - The new method was evaluated against existing methods on the HUST-OBC and EV-OBC datasets, showing competitive Top-1 and Top-10 accuracy rates, particularly excelling in zero-shot scenarios [38][45]. - In zero-shot settings, the proposed method outperformed all other methods, achieving a Top-10 accuracy improvement of 26.2% on the HUST-OBC dataset and 13.6% on the EV-OBC dataset [45][46]. - The explainability of the model's outputs was quantitatively assessed using BERT-Score, demonstrating higher reliability compared to other large visual language models [47][50]. Group 4: Qualitative Analysis - The model exhibited strong recognition capabilities in both validation and zero-shot settings, generating semantically reasonable predictions for characters that have not been deciphered by human experts [66][68]. - The dual analysis of radicals and pictographs provided a comprehensive visual-semantic mapping, enhancing the model's ability to produce interpretable outputs even for undeciphered characters [68][70].
拜拜Claude!阿里最强万亿模型编程秒了Opus4,实测在此
量子位· 2025-09-06 04:21
Core Viewpoint - Alibaba has launched its largest model to date, Qwen3-Max-Preview, with a total parameter count of 1 trillion, significantly enhancing its capabilities compared to the previous version, Qwen3, which had 235 billion parameters [1][2]. Summary by Sections Model Specifications - Qwen3-Max-Preview boasts a parameter count that is nearly four times larger than its predecessor, Qwen3 [2]. - The new version shows significant improvements in understanding Chinese and English, following complex instructions, and tool utilization, while also reducing knowledge hallucinations [2]. Availability and Performance - The model is fully launched and can be accessed through the Tongyi APP, Qwen Chat web interface, and Alibaba Cloud API [3]. - Initial evaluations indicate that Qwen3-Max-Preview outperforms not only its predecessor but also competitors like Claude Opus 4 [4][6]. User Experience and Testing - Users have reported a strong desire to test the model due to its impressive scale [6]. - The model has demonstrated its problem-solving abilities by successfully answering an AIME math competition question and generating code for interactive animations and games like Minesweeper [13][15][19]. Technical Details - Qwen3-Max-Preview supports multi-modal inputs, allowing users to upload images directly [12]. - The model's code generation speed is noted to be efficient, producing results quickly and accurately [23][26]. Pricing and API Information - The API pricing is structured based on input token counts, with specific rates for different ranges of tokens [27]. - The model supports a context length of 262,144 tokens, with maximum input and output limits specified [28]. Future Developments - Although the model has not been officially open-sourced, there are expectations for future versions, including a reasoning version that may be released soon [30][34]. - The lead of the open-source team has expressed confidence in the model's capabilities and future expansions [31][32].
字节发了个机器人全能大模型,带队人李航
量子位· 2025-09-06 04:21
Core Viewpoint - Byte's Seed has introduced Robix, a single model that integrates reasoning, task planning, and natural language interaction for robots, eliminating the need for multiple modules [1][4][27]. Group 1: Robix Model Overview - Robix is designed to handle high-level cognitive tasks while a lower-level system (VLA) executes commands issued by Robix [6][9]. - The model is a visual-language integrated single model that processes images and language simultaneously, streamlining communication and decision-making [10][11]. - It employs a chain of thought reasoning and a three-stage training strategy to enhance its capabilities [11][12]. Group 2: Training Methodology - The training consists of three phases: 1. Continuous pre-training with extensive robot-related data to understand 3D space and correlate language with visuals. 2. Supervised fine-tuning using real-world scenarios to teach task handling and basic conversation skills. 3. Reinforcement learning to correct discrepancies between thought and action through a reward system [19][20]. Group 3: Performance Metrics - In foundational ability tests, Robix outperformed Qwen 2.5-VL in 7 out of 8 spatial understanding tasks, achieving higher average accuracy [21]. - Robix's performance in various benchmarks shows it surpassing closed-source models like GPT-4o and Gemini 2.5 Pro in most tests [21][22]. - In real-world interaction tests, Robix-32B achieved an average task progress of 92.5%, exceeding Gemini 2.5 Pro and GPT-4o by 4.3 and 28.1 percentage points, respectively [25]. Group 4: Leadership and Development - The project is led by Dr. Li Hang, who has a significant background in AI and robotics, previously serving as the head of Huawei's Noah's Ark Lab [28][30]. - Despite rumors of retirement, Dr. Li continues to contribute to the project in a consulting capacity [31].
调整训练数据出场顺序,大模型就能变聪明!无需扩大模型/数据规模
量子位· 2025-09-06 04:21
Core Viewpoint - The article emphasizes the importance of data organization order in language model training, introducing a new paradigm called DELT (Data Efficacy in LM Training) that enhances model performance without increasing data volume or model size [1][3][11]. Group 1: Data Efficiency vs. Data Efficacy - Data efficiency focuses on improving model training efficiency through data selection, while data efficacy enhances model performance through data organization, which has been largely overlooked [5][6][15]. - The analogy of cooking is used to illustrate that data efficiency is like selecting fresh ingredients, whereas data efficacy is akin to a chef timing the addition of spices to maximize flavor [7]. Group 2: Importance of Data Organization - The sequence of training samples is crucial, as modern language models often undergo limited training cycles, making the order of data presentation significantly impactful [9][10]. - The DELT paradigm aims to fully exploit the potential of training data by introducing data sorting strategies, leading to improved efficiency and efficacy [11][13]. Group 3: DELT Paradigm Components - DELT integrates three core components: data scoring, data selection, and data ordering, where data scoring assigns scores based on attributes like difficulty and quality [19][20]. - A novel folding ordering method is proposed to enhance data efficacy by preventing model forgetting and ensuring balanced data distribution [23][27]. Group 4: Performance Results - The DELT paradigm has shown significant performance improvements across various model sizes and data scales, outperforming conventional methods in multiple evaluation metrics [28]. - For instance, with a model size of 1 billion, DELT achieved an average score of 39.17% compared to 37.77% for conventional methods [28]. Group 5: Implications for AI Training - DELT provides a new perspective for data-centric AI, suggesting that AI training should adopt personalized and structured learning approaches similar to human education practices [29][30].
视频理解新标杆,快手多模态推理模型开源:128k上下文+0.1秒级视频定位+跨模态推理
量子位· 2025-09-05 10:56
Core Viewpoint - Keye-VL 1.5, an advanced multimodal model developed by Kuaishou, has been open-sourced, showcasing significant improvements in video understanding and reasoning capabilities compared to its predecessor [1][4][6]. Group 1: Model Capabilities - Keye-VL 1.5 features enhanced temporal localization abilities, achieving 0.1-second precision in identifying when specific objects appear in videos [10][8]. - The model introduces a Slow-Fast dual encoding mechanism, allowing for a 128k long context window while balancing speed and detail [5][8]. - It has demonstrated superior performance in various benchmarks, scoring 73.0 in the Video-MME short video benchmark and leading in multiple other evaluation scenarios [6][18]. Group 2: Benchmark Performance - Keye-VL 1.5 outperforms other models such as Qwen2.5-VL 7B across several benchmarks, including OpenCompass and MMBench, achieving top scores in its class [19][21]. - In human-annotated metrics, Keye-VL 1.5 achieved an average score of 3.53, which is an improvement of 0.51 points over the preview version and surpasses competing models [24][25]. Group 3: Model Architecture - The architecture of Keye-VL 1.5 consists of a "Vision Transformer (ViT) + MLP projector + language decoder" structure, designed to capture global spatial relationships in video frames [27][28]. - The model employs a four-stage progressive pre-training pipeline, utilizing over 1 trillion tokens from diverse data sources to enhance its multimodal capabilities [39][41]. Group 4: Research and Development - The Keye team has made significant contributions to the field, presenting multiple research findings at top conferences, including advancements in multimodal reinforcement learning and visual language model governance frameworks [51][54]. - The team's efforts are focused on integrating visual, linguistic, and behavioral data to enhance cognitive and decision-making processes in AI applications [50].