Workflow
量子位
icon
Search documents
OpenAI两位首席最新采访信息量好大!终极目标是“自动化研究员”,招人并非寻找“最出圈”的人
量子位· 2025-09-26 04:56
Core Insights - OpenAI's latest interview reveals significant advancements in GPT-5, focusing on long-term reasoning and the introduction of agentic behavior into mainstream applications [1][7][9] - The company emphasizes the importance of protecting foundational research while avoiding distractions from short-term product competition [6][48] Group 1: GPT-5 Developments - GPT-5 aims to mainstream reasoning capabilities, moving beyond previous models that focused on immediate responses [8][10] - The model represents a strategic shift towards enhancing reasoning and agentic behaviors, making it more accessible to users [9][10] Group 2: Evaluation and Progress - Current evaluation metrics are nearing saturation, necessitating new methods to assess models' abilities to discover new insights and achieve practical advancements in economically relevant areas [12][13] - OpenAI plans to focus on the time span in which models can reason and make progress, with current capabilities reaching approximately 1 to 5 hours [23][25] Group 3: Automation and Research Goals - OpenAI's long-term goal is to develop an automated researcher capable of discovering new ideas, starting with internal research automation [20][21] - The company is interested in measuring the duration of autonomous operation as a key evaluation metric [25] Group 4: Reinforcement Learning (RL) - Despite skepticism, reinforcement learning continues to thrive, with OpenAI exploring new directions and ideas [27][29] - The evolution of reward models is expected to accelerate, simplifying the process of developing effective fine-tuning datasets [29][30] Group 5: Programming and Coding - OpenAI's GPT-5-codex is designed to optimize programming tasks, addressing previous models' inefficiencies in problem-solving time allocation [32][34] - The current state of coding tools is likened to the "uncanny valley," where they are effective but not yet fully comparable to human performance [37][41] Group 6: Talent Acquisition and Research Culture - OpenAI prioritizes persistence and the ability to learn from failure in its research culture, seeking individuals with a solid technical foundation [44][46] - The company focuses on foundational research rather than merely following competitors, fostering an innovative environment [46][48] Group 7: Resource Allocation - If given additional resources, OpenAI would prioritize computational power, recognizing its critical role in research and development [49][51] - The company maintains a long-term research focus, emphasizing the importance of computational resources and physical constraints in future advancements [52]
超10万亿Tokens的高质量数据集是怎么炼成的?专访中国电信天翼AI阮宜龙
量子位· 2025-09-26 02:08
Core Viewpoint - The article emphasizes the importance of high-quality datasets in developing and training AI models, highlighting that such datasets are crucial for enhancing model performance and accuracy [4][6][14]. Group 1: High-Quality Data Sets - The company has amassed over 10 trillion tokens of general model corpus data and specialized datasets covering 14 key industries, with a total storage capacity of 350TB [1][6]. - These datasets are not just raw data but are meticulously labeled and optimized, making them ready for immediate application in various industries [3][4]. - High-quality datasets are essential as they directly influence the accuracy, generalization, and usability of AI models, serving as the foundation for effective model training [4][5]. Group 2: Technological Infrastructure - The company has developed the Xingchen MaaS platform, which operates as a data refinery, creating a complete closed loop of "data-model-service" [6][17]. - The platform includes a data toolchain that efficiently processes various data types and a model toolchain that enhances data into usable models, ensuring a robust data lifecycle management [18][19]. - The platform's capabilities allow for the generation of synthetic data for rare or extreme scenarios, enhancing model robustness and safety [18][19]. Group 3: Strategic Considerations - The company's investment in high-quality datasets is driven by national strategy, market demand, and its own operational advantages, positioning itself as a key player in the AI landscape [15][16]. - The government has recognized AI as a national strategy, prompting the company to build data infrastructure that supports AI technology breakthroughs [15][16]. - The company aims to leverage its extensive data resources and customer base to enhance its capabilities in high-quality dataset development [16]. Group 4: Industry Applications - The company has successfully implemented AI solutions in various sectors, such as textile quality inspection, achieving over 95% accuracy in defect detection, significantly improving production efficiency [9][26]. - High-quality datasets have been developed for multiple industries, including healthcare, agriculture, and smart cities, demonstrating the versatility and impact of AI applications [36][37]. - The company has collaborated with various sectors to create tailored datasets that address specific industry challenges, enhancing operational efficiency and service quality [36][37]. Group 5: Future Vision - The company envisions becoming a leading provider of general AI services, focusing on technological advancement, inclusive applications, and an open ecosystem for collaboration [42][43]. - It aims to cultivate a skilled workforce in AI, ensuring that technological innovations translate into practical applications that benefit society [43][44]. - The ultimate goal is to enhance the digital economy while ensuring safety and fairness in AI applications, contributing to a more equitable society [44][45].
“零人”搞医学研究:清华AI智能体从灵感到论文全程自主
量子位· 2025-09-26 02:08
清华大学自动化系索津莉课题组 投稿 量子位 | 公众号 QbitAI 医学研究迎来"零人工"时代了?! 清华大学自动化系索津莉课题组,发布首个专为医疗信息学设计的全自主AI研究框架—— OpenLens AI 。 首次实现从文献挖掘→实验设计→数据分析→代码生成→可投稿论文的全链条自动化闭环。 为什么要推出该系统?主要是医疗信息学研究正陷入效率困局——多中心数据融合、知识爆炸、跨学科协作需求,使传统科研模式日益捉襟见 肘。 而OpenLens AI引入医学专属质量控制方法,生成出版级别的高质量科研论文,将科研周期从"月级"压缩至"小时级",宣告医学研究迎来"零人 工"时代。 下面详细来看—— 五大核心模块:AI科研的梦之队 OpenLens AI不仅实现全流程自动化,也在质量控制方面设立新标杆,集成四大保障机制: OpenLens AI采用模块化架构,由五个专门化的智能体协同工作,构建起完整的科研自动化流水线: 主管模块 作为全局协调者,将用户查询分解为结构化子任务,确保整个研究流程的透明度和可解释性。 文献综述者 构建自主知识探索管道,利用基于ReAct的推理框架,检索并综合相关文献,为研究提供坚实的理论基 ...
多模态推理最高加速3.2倍!华为诺亚新算法入选NeurIPS 2025
量子位· 2025-09-26 02:08
Core Insights - Huawei's Noah's Ark Lab has developed a new inference acceleration framework called Vision-Aware Speculative Decoding (ViSpec), achieving up to 3.22 times acceleration for visual language models (VLM) without sacrificing generation quality [3][8][25]. Group 1: Current Challenges in VLM - Speculative decoding has become a standard method for accelerating large language model (LLM) inference, but its application in VLM has been limited, with existing methods achieving less than 1.5 times acceleration [2][4]. - The primary challenge lies in processing visual information, where VLMs convert images into numerous "visual tokens," leading to inefficiencies in draft models [6][4]. Group 2: ViSpec Framework Innovations - ViSpec introduces a lightweight visual adapter that efficiently compresses image embeddings into compact visual representations, significantly improving the draft model's decision-making efficiency [9][11][12]. - A global visual feature injection mechanism is implemented to maintain the influence of visual context throughout the text generation process, effectively overcoming the "lost-in-the-middle" issue [13][15][17]. - The team developed a novel data generation method to create high-quality datasets for training, enabling the generation of longer and more detailed responses [18][20]. Group 3: Experimental Results - Extensive experiments on various mainstream VLMs, including LLaVA-1.6 and Qwen2.5-VL, demonstrated that ViSpec achieved acceleration rates ranging from 1.85 to 3.22 times, with an average acceleration exceeding 2.5 times [22][24]. - ViSpec's performance was validated through ablation studies, showing that each component contributes significantly to the overall acceleration, with image embedding compression alone providing a 30% performance boost [26][27][28]. Group 4: Future Outlook - The introduction of ViSpec marks a significant advancement in VLM inference acceleration, paving the way for practical applications in edge devices such as smartphones and smart homes, enhancing human-machine interaction [29][30].
ChatGPT新功能,抢占你早上第一个打开的App
量子位· 2025-09-26 02:08
Core Insights - ChatGPT has introduced a new feature called ChatGPT Pulse, which allows for personalized updates without the need for user prompts, functioning as a proactive assistant [1][5][6] - The feature learns from user interactions and integrates with calendars and emails to provide tailored content, including daily briefings and suggestions [8][9][10] - Currently, this feature is available only to Pro users, indicating a potential strategy to enhance user engagement and subscription value [15] Group 1 - ChatGPT Pulse represents a shift from a reactive to a proactive AI assistant, capable of monitoring important tasks and providing timely information [5][6] - The system generates a personalized "core dynamic" briefing based on user data, which may include updates on events, vocabulary lessons, and meal suggestions [8][9] - User feedback on the Pulse experience is utilized solely to enhance individual user interactions, ensuring a customized experience [11][13] Group 2 - The feature is designed to avoid overwhelming users with constant notifications, focusing instead on efficient problem-solving [10] - ChatGPT Pulse can suggest activities and plans based on the user's schedule and preferences, demonstrating its utility as a personal assistant [13][14] - The introduction of this feature aligns with OpenAI's vision of creating intelligent agents that can operate alongside users, enhancing productivity and daily life [5][6]
缺数据也能拿SOTA?清华&上海AI Lab破解机器人RL两大瓶颈
量子位· 2025-09-26 02:08
Core Viewpoint - The article discusses the development of SimpleVLA-RL, an end-to-end online training solution for Visual-Language-Action (VLA) models, aimed at enhancing the flexibility and performance of robots in complex environments while addressing existing training bottlenecks [3][12]. Group 1: Key Challenges in Existing Training Paradigms - Current training paradigms face significant challenges, including high data collection costs and insufficient generalization capabilities [2][8]. - The reliance on large-scale, high-quality robot operation trajectories limits scalability and increases costs, making data acquisition a major hurdle [8]. - The models struggle with generalization, particularly in out-of-distribution tasks and new environments, leading to performance drops in long-sequence dependencies and combinatorial tasks [8][9]. Group 2: SimpleVLA-RL Framework - SimpleVLA-RL employs a combination of interactive trajectory sampling, result-based rewards, and enhanced exploration to tackle the three core challenges of VLA model training [5][6]. - The framework demonstrates state-of-the-art (SoTA) performance in standard benchmarks like LIBERO and RoboTwin, achieving significant improvements even with limited data [5][21]. - In scenarios with single demonstration data, the average success rate in LIBERO increased from 48.9% to 96.9% after applying SimpleVLA-RL [5]. Group 3: Performance Metrics and Results - SimpleVLA-RL achieved an average success rate of 99.1% in LIBERO, with long-sequence tasks improving by 12.0 percentage points [21]. - In RoboTwin1.0, the average success rate rose from 39.8% to 70.4%, with specific tasks like "Blocks Stack" improving by 33.1 percentage points [23]. - The framework also demonstrated a significant increase in performance in RoboTwin2.0, with average success rates improving from 38.3% to 68.8% [25]. Group 4: Innovations and Discoveries - The training process led to the emergence of new operational strategies, such as the "Pushcut" phenomenon, where the model autonomously discovers more efficient methods beyond human demonstrations [10][31]. - This phenomenon indicates that reinforcement learning can enable VLA models to surpass the limitations of human demonstration patterns, paving the way for future adaptive VLA model development [31].
小米17 4499开卖,首发五代骁龙8!雷军:500亿砸自研芯片
量子位· 2025-09-25 23:54
Core Viewpoint - Xiaomi's recent product launch event showcased a significant shift towards becoming a hardcore technology company, emphasizing innovation across its product lineup, particularly the Xiaomi 17 series, which aims to compete directly with Apple's iPhone [6][7][10]. Group 1: Xiaomi 17 Series - The Xiaomi 17 series includes three models: standard, Pro, and Pro Max, with a starting price of 4499 yuan [3][11]. - The series features the new Snapdragon 8 Gen 2 mobile platform, built on a 3nm process, with a peak frequency of 4.6GHz, positioning it as a top-tier flagship [14][15]. - The design balances lightweight and premium feel, with a thickness of 8.06mm and a weight of 191g, featuring a 6.3-inch display for standard and Pro models, and a 6.9-inch display for Pro Max [18][21]. - The camera system has been enhanced with Leica tuning, focusing on portrait photography with new algorithms for skin tone restoration and detail enhancement [44][46]. - The Pro and Pro Max models introduce a new "back screen" feature, allowing for additional interaction and notifications, enhancing user experience [40][41]. Group 2: Battery and Display Innovations - The standard version of the Xiaomi 17 has a battery capacity of 7000mAh, while the Pro Max reaches 7500mAh, demonstrating superior endurance compared to the iPhone 17 [34][35]. - The display utilizes new red light-emitting materials, improving brightness efficiency by 11.4%, marking a significant technological advancement in domestic manufacturing [29][31]. Group 3: Xiaomi Pad 8 - The Xiaomi Pad 8 series was also launched, featuring an 11.2-inch 3.2K display, with a starting price of 2199 yuan, designed to be lightweight and portable [50][51]. - The Pad runs on the new Surge OS 3, enabling desktop-like functionality, supporting various applications and multitasking capabilities [57][59]. - The standard version is powered by the Snapdragon 8s Gen 4 processor, while the Pro version features the Snapdragon 8 Gen 2, with significant performance improvements [63][64]. Group 4: Future Aspirations - Xiaomi's CEO emphasized the company's commitment to developing its own SoC, with a planned investment of 50 billion yuan over the next decade, aiming for high-end market penetration [68][69].
马斯克新模型背后算法来自英伟达???
量子位· 2025-09-25 23:54
Core Viewpoint - Grok-4-fast has demonstrated exceptional performance in cost reduction and efficiency, surpassing even GPT-5, which is associated with routing capabilities [1][38]. Group 1: Performance and Efficiency - Grok-4-fast's impressive reasoning efficiency is attributed to advanced scaling of computational power [2]. - The underlying technology of Grok is linked to NVIDIA's algorithmic advancements, particularly a new model called Jet-Nemotron [3][4]. - Jet-Nemotron-2B has shown performance comparable to leading open-source models while achieving a speed increase of approximately 53 times [7]. Group 2: Technological Innovations - The key innovation behind Grok-4-fast is a new framework called PostNAS, which significantly reduces training costs and allows for more comprehensive exploration of model structures [10][11]. - PostNAS employs a hybrid structure model that retains essential attention layers while eliminating redundant ones to enhance efficiency [13][14]. - The framework includes four core components: full attention layer placement, optimal linear attention module selection, design of superior linear attention modules, and hardware-aware architecture search [12]. Group 3: Attention Mechanisms - The NVIDIA team evaluated six advanced linear attention modules, with Gated DeltaNet achieving the highest accuracy due to its data-dependent gating mechanism and delta rule [18][19]. - JetBlock, a more advanced linear attention module, utilizes dynamic convolution to adaptively generate convolution kernels based on input features, outperforming Gated DeltaNet in accuracy for mathematical reasoning and retrieval tasks [21][24]. Group 4: Hardware Optimization - NVIDIA's hardware-aware architecture search focuses on optimizing key parameters rather than solely relying on parameter size, which does not accurately reflect real hardware efficiency [27][28]. - The team found that the size of the key-value (KV) cache is crucial for throughput in long-context and long-text generation, leading to a targeted optimization approach [30][31]. Group 5: Industry Impact - PostNAS is expected to influence the AI industry by providing a low-cost, high-efficiency architecture exploration method applicable to any pre-trained transformer [34]. - The Jet-Nemotron model is open-source, allowing various manufacturers to integrate it without retraining, significantly reducing costs while maintaining accuracy [36][42]. - The potential application of Jet-Nemotron across major AI companies like OpenAI and Google could lead to widespread improvements in model performance and cost efficiency [43].
OpenAI宋飏被Meta挖跑了!扩散模型崛起关键人物,加入MSL再会师清华校友赵晟佳
量子位· 2025-09-25 13:00
Core Viewpoint - Meta has successfully recruited Yang Song, a prominent researcher from OpenAI, which has raised significant interest in the AI research community due to his notable contributions to diffusion models and generative modeling [1][6][7]. Group 1: Yang Song's Background and Achievements - Yang Song is recognized as a key contributor to the rise of diffusion models and has been a leading figure in OpenAI's Strategic Explorations Team [10][11]. - He graduated from Tsinghua University at the age of 16 and later earned his PhD from Stanford University, where he worked under the guidance of a notable professor [20][36]. - His most famous work includes the development of Consistency Models, which outperform diffusion models in speed and performance, generating images significantly faster [12][14][17]. Group 2: Impact of Yang Song's Work - The Consistency Models developed by Yang Song can generate 64 images of 256×256 pixels in approximately 3.5 seconds, showcasing a substantial improvement over existing models [12][14]. - His research has led to the creation of Continuous-Time Consistency Models, which address stability and scalability issues in earlier models, achieving a training scale of 1.5 billion parameters [15][18]. - The advancements made by Yang Song and his team are considered potential game-changers in the generative modeling field, with discussions suggesting they could "end" the dominance of diffusion models [18][19]. Group 3: Meta's Strategic Recruitment - Meta's recruitment of Yang Song is part of a broader strategy to enhance its AI capabilities by attracting top talent from leading organizations like OpenAI [9][10]. - The move is seen as a significant loss for OpenAI, with many colleagues expressing surprise at his departure [7][6]. - The motivations behind such moves are speculated to extend beyond financial incentives, as many researchers prioritize impactful work and collaboration opportunities [9].
GPT-5通过“哥德尔测试”!独创性解决博士生都得花几天时间的开放数学问题
量子位· 2025-09-25 13:00
Core Viewpoint - GPT-5 has demonstrated the ability to solve complex mathematical optimization problems, achieving success in three out of five challenges presented by researchers, showcasing its advanced mathematical reasoning capabilities [2][21]. Group 1: GPT-5's Performance - In a recent study, GPT-5 was tasked with solving five unsolved optimization conjectures, successfully solving three of them [2][21]. - The challenges required a level of mathematical understanding typically expected from PhD-level researchers, rather than high school students [3][21]. - GPT-5's performance included generating a novel proof for one problem that differed from the researchers' expectations but was still valid [2][21]. Group 2: The Gödel Test - The researchers referred to their assessment as the "Gödel Test," which involved problems that required deep reasoning and could not be easily found in existing literature [10][11]. - The problems primarily focused on submodular maximization, a concept in combinatorial mathematics characterized by diminishing returns [12][13]. Group 3: Problem-Solving Details - For the first problem, GPT-5 was required to maximize a function composed of both monotonic and non-monotonic submodular functions under specific constraints, and it provided a performance guarantee [23][24]. - In the second problem, GPT-5 was tasked with maximizing a monotonic submodular function while adhering to complex constraints, yielding a solution that was more reasonable than initially anticipated [39][40]. - The third problem involved maximizing a continuous monotonic function under convex constraints, where GPT-5's response was generally correct but contained minor issues [59][60]. Group 4: Limitations and Challenges - GPT-5 struggled with the fourth and fifth problems, which required integrating insights from multiple sources, highlighting its limitations in comprehensive reasoning [26][73]. - In the fourth problem, GPT-5 failed to provide a valid solution and merely restated known information, while in the fifth problem, its output was deemed unreliable and unusable [70][81]. Group 5: Overall Assessment - Overall, GPT-5 exhibited significant improvements in basic mathematical capabilities compared to earlier models, particularly in combinatorial optimization [26][41]. - The model's performance was influenced by the prompts provided, with more detailed requests leading to more complete and coherent answers [26][62].