Workflow
量子位
icon
Search documents
实测Kimi全新Agent模型「OK Computer」,很OK
量子位· 2025-09-27 01:30
Core Viewpoint - Kimi has launched a new Agent model named OK Computer, which showcases advanced capabilities in web development, data processing, and content generation [1][4][6]. Group 1: Design Tasks - The new Agent can create a Pygame-themed webpage autonomously, including sections on the history of Pygame, game showcases, core features, and development tutorials, demonstrating its ability to design and implement content independently [9][10][12]. - The model generates a Todo List to track progress on tasks, marking completed items and allowing users to monitor the workflow [16]. - It can autonomously conduct web searches and generate materials needed for webpage creation, showcasing its self-sufficiency in the design process [17]. Group 2: Generation Tasks - The Agent was tasked with creating a children's story and visualizing it as a picture book, which included story writing, image generation, and audio production, highlighting its multi-modal content creation capabilities [20][21]. - Additionally, it successfully produced an editable PowerPoint presentation on China's top ten original musicals, demonstrating its proficiency in generating presentation materials [22][24][26]. Group 3: Analysis Tasks - The Agent can handle data analysis tasks by searching for financial data and visualizing it, thus alleviating the burden of data collection and analysis from users [29][30]. - It can also analyze lengthy Excel documents and present the data in a clear and understandable manner, indicating its effectiveness in managing complex data sets [31][32].
首个开源实现100%可复现的稳定RL训练框架来了!2次结果完全重合
量子位· 2025-09-27 01:30
Core Insights - The article discusses the achievement of SGLang and slime teams in creating a fully reproducible and stable reinforcement learning (RL) training framework based on the Qwen3-8B model, addressing the issue of non-deterministic outputs in large language model (LLM) inference [1][2][6]. Group 1: Deterministic Inference - SGLang and slime teams have developed a deterministic inference solution that integrates batch invariant operators, CUDA Graph, radix cache, and chunked prefill, ensuring high performance while maintaining compatibility with key features [5][8]. - The implementation of batch invariant operators addresses the core issue of output uncertainty in LLM inference, which arises from varying batch sizes during dynamic batching [7][8]. - Testing has shown that the average performance drop for SGLang's solution is 34.35%, significantly better than the 61.5% decline reported by Thinking Machines Lab [5][12]. Group 2: Performance Metrics - The article presents performance metrics for different inference modes, showing that deterministic modes yield consistent outputs across various batch sizes, with unique output counts significantly reduced [10][11]. - In terms of end-to-end latency, deterministic inference shows a performance drop of 25% to 45%, with specific backend performance metrics indicating improvements in certain configurations [12][13]. Group 3: Future Developments - Future efforts will focus on optimizing batch invariant operators to enhance performance, particularly for RL inference, and expanding support to mixture of experts (MoE) models [16][18]. - The team aims to improve radix cache functionality and explore tensor parallelism to further enhance the capabilities of deterministic inference [18].
高通组局,宇树王兴兴说了一堆大实话
量子位· 2025-09-26 09:12
Core Viewpoint - The article discusses the challenges and opportunities in the field of embodied intelligence and robotics, emphasizing the importance of collaboration among industry players to address technical difficulties and accelerate progress [3][25][48]. Group 1: Industry Challenges - The current state of robotics is characterized by diverse technical routes, leading to a lack of significant progress despite the apparent excitement in the field [4][25]. - Many robotics and chip manufacturers overlook the critical role of chips in robotics, which is essential for enhancing performance and reliability [16][18]. - The industry faces difficulties in deploying large-scale computing power in robots due to space constraints, battery capacity, and heat dissipation issues [20][21]. Group 2: Technological Developments - The goal of companies like Yushu Technology is to develop universal AI for robots that can perform various tasks in unfamiliar environments, akin to a "ChatGPT moment" for robotics [11][12]. - The development stages for achieving advanced robotic capabilities include fixed action demonstrations, real-time action generation, task execution in unfamiliar settings, and achieving high success rates in delicate operations [12]. - The future of embodied intelligence in robotics may involve using mobile phone chips, which could provide significant potential for innovation [24]. Group 3: Collaboration and Open Source - The article highlights the importance of open-sourcing models to foster collaboration and accelerate advancements in the field, similar to OpenAI's approach with earlier GPT models [28][29]. - Companies are encouraged to maintain an open attitude towards various models and collaborate with third parties to enhance development [30][31]. Group 4: AI and Agent Systems - The article discusses the role of agent systems in AI, emphasizing the need for end-cloud collaboration to improve user experience and privacy [35][36]. - The demand for end-side models is increasing, as they are crucial for understanding user needs and facilitating communication with cloud models [39][40]. - The industry lacks a unified standard for AI applications across different devices, leading to high development costs and fragmentation [48][50]. Group 5: Future Directions - The future of AI in robotics and other sectors will likely involve creating a cross-terminal operating system that integrates various services and enhances user experience [50][51]. - Collaboration among industry players is essential for building the necessary infrastructure and supporting innovation in smart devices [51].
Gemini灵魂人物加盟xAI,马斯克亲自夹道欢迎!
量子位· 2025-09-26 09:12
Core Viewpoint - Dustin Tran, a former senior researcher at Google DeepMind, has joined xAI and is recognized for his significant contributions to the development of the Gemini AI model, which has achieved state-of-the-art reasoning capabilities and won multiple prestigious competitions [1][2][12]. Group 1: Dustin Tran's Contributions - Tran played a pivotal role in the development of the Gemini product line, which helped Google regain its position in the AI landscape after the decline of GPT [2][12]. - Under Tran's leadership, the Gemini series, particularly Gemini 1.5 Pro, excelled in various AI benchmarks, marking a significant turnaround for Google [15][16]. - Tran's team was instrumental in the rapid development of Gemini's predecessor, Bard, despite its initial poor reception [13][14]. Group 2: Transition to xAI - Tran's decision to join xAI was influenced by three main factors: superior computing power, innovative data strategies, and alignment with Elon Musk's corporate philosophy [27][28][29]. - He expressed admiration for the extensive resources available at xAI, which he found unparalleled even during his tenure at Google [30][31]. - Tran believes that xAI has the potential to achieve rapid advancements in AI capabilities, surpassing other companies in a short timeframe [35][36]. Group 3: Background and Achievements - Tran has an impressive academic background, having graduated from UC Berkeley, earned a master's degree from Harvard, and pursued a PhD at Columbia University [22]. - He has contributed to several influential projects and publications in the AI field, with over 24,000 citations on Google Scholar [25][23]. - His early career included a brief internship at OpenAI, where he was involved in notable projects like the Dota 2 AI [21][19].
谁是最强“打工AI”?OpenAI亲自测试,结果第一不是自己
量子位· 2025-09-26 04:56
Core Insights - OpenAI has introduced a new benchmark called GDPval to evaluate the economic value of AI models in real-world tasks, covering 44 occupations that contribute a total of $3 trillion annually to the U.S. GDP [2][15] - Claude Opus 4.1 emerged as the best-performing model, with 47.6% of its outputs rated comparable to human expert results, while GPT-5 followed with 38.8% [4][6] - OpenAI's models show linear performance improvement over generations, with significant advancements in task accuracy and aesthetic capabilities [32][33] Benchmark Overview - GDPval focuses on nine key industries contributing over 5% to the U.S. GDP, selecting occupations primarily involving numerical tasks [14] - A total of 44 occupations were identified, with an average of 14 years of experience among the recruited industry experts who designed the tasks [15][18] - The tasks are based on real work outcomes, requiring an average of 7 hours to complete, with some complex tasks taking weeks [19] Evaluation Methodology - OpenAI employed a blind expert pairwise comparison method for task evaluation, achieving a 66% consistency rate with human expert ratings [26][27] - Each task underwent multiple rounds of human expert review, ensuring high quality and relevance [23][24] Model Performance - The evaluation revealed that GPT-5 excels in accuracy for text-based tasks, while Claude demonstrates superior performance in handling various file formats, showcasing strong visual perception and design capabilities [33] - OpenAI noted that combining AI models with human oversight could lead to more cost-effective and efficient task completion [35][36] Limitations and Future Plans - GDPval has limitations, including a small dataset of only 44 occupations and a focus on knowledge work that excludes physical labor [40] - OpenAI plans to expand GDPval's scope and enhance its realism and interactivity in future iterations [41]
OpenAI两位首席最新采访信息量好大!终极目标是“自动化研究员”,招人并非寻找“最出圈”的人
量子位· 2025-09-26 04:56
Core Insights - OpenAI's latest interview reveals significant advancements in GPT-5, focusing on long-term reasoning and the introduction of agentic behavior into mainstream applications [1][7][9] - The company emphasizes the importance of protecting foundational research while avoiding distractions from short-term product competition [6][48] Group 1: GPT-5 Developments - GPT-5 aims to mainstream reasoning capabilities, moving beyond previous models that focused on immediate responses [8][10] - The model represents a strategic shift towards enhancing reasoning and agentic behaviors, making it more accessible to users [9][10] Group 2: Evaluation and Progress - Current evaluation metrics are nearing saturation, necessitating new methods to assess models' abilities to discover new insights and achieve practical advancements in economically relevant areas [12][13] - OpenAI plans to focus on the time span in which models can reason and make progress, with current capabilities reaching approximately 1 to 5 hours [23][25] Group 3: Automation and Research Goals - OpenAI's long-term goal is to develop an automated researcher capable of discovering new ideas, starting with internal research automation [20][21] - The company is interested in measuring the duration of autonomous operation as a key evaluation metric [25] Group 4: Reinforcement Learning (RL) - Despite skepticism, reinforcement learning continues to thrive, with OpenAI exploring new directions and ideas [27][29] - The evolution of reward models is expected to accelerate, simplifying the process of developing effective fine-tuning datasets [29][30] Group 5: Programming and Coding - OpenAI's GPT-5-codex is designed to optimize programming tasks, addressing previous models' inefficiencies in problem-solving time allocation [32][34] - The current state of coding tools is likened to the "uncanny valley," where they are effective but not yet fully comparable to human performance [37][41] Group 6: Talent Acquisition and Research Culture - OpenAI prioritizes persistence and the ability to learn from failure in its research culture, seeking individuals with a solid technical foundation [44][46] - The company focuses on foundational research rather than merely following competitors, fostering an innovative environment [46][48] Group 7: Resource Allocation - If given additional resources, OpenAI would prioritize computational power, recognizing its critical role in research and development [49][51] - The company maintains a long-term research focus, emphasizing the importance of computational resources and physical constraints in future advancements [52]
超10万亿Tokens的高质量数据集是怎么炼成的?专访中国电信天翼AI阮宜龙
量子位· 2025-09-26 02:08
Core Viewpoint - The article emphasizes the importance of high-quality datasets in developing and training AI models, highlighting that such datasets are crucial for enhancing model performance and accuracy [4][6][14]. Group 1: High-Quality Data Sets - The company has amassed over 10 trillion tokens of general model corpus data and specialized datasets covering 14 key industries, with a total storage capacity of 350TB [1][6]. - These datasets are not just raw data but are meticulously labeled and optimized, making them ready for immediate application in various industries [3][4]. - High-quality datasets are essential as they directly influence the accuracy, generalization, and usability of AI models, serving as the foundation for effective model training [4][5]. Group 2: Technological Infrastructure - The company has developed the Xingchen MaaS platform, which operates as a data refinery, creating a complete closed loop of "data-model-service" [6][17]. - The platform includes a data toolchain that efficiently processes various data types and a model toolchain that enhances data into usable models, ensuring a robust data lifecycle management [18][19]. - The platform's capabilities allow for the generation of synthetic data for rare or extreme scenarios, enhancing model robustness and safety [18][19]. Group 3: Strategic Considerations - The company's investment in high-quality datasets is driven by national strategy, market demand, and its own operational advantages, positioning itself as a key player in the AI landscape [15][16]. - The government has recognized AI as a national strategy, prompting the company to build data infrastructure that supports AI technology breakthroughs [15][16]. - The company aims to leverage its extensive data resources and customer base to enhance its capabilities in high-quality dataset development [16]. Group 4: Industry Applications - The company has successfully implemented AI solutions in various sectors, such as textile quality inspection, achieving over 95% accuracy in defect detection, significantly improving production efficiency [9][26]. - High-quality datasets have been developed for multiple industries, including healthcare, agriculture, and smart cities, demonstrating the versatility and impact of AI applications [36][37]. - The company has collaborated with various sectors to create tailored datasets that address specific industry challenges, enhancing operational efficiency and service quality [36][37]. Group 5: Future Vision - The company envisions becoming a leading provider of general AI services, focusing on technological advancement, inclusive applications, and an open ecosystem for collaboration [42][43]. - It aims to cultivate a skilled workforce in AI, ensuring that technological innovations translate into practical applications that benefit society [43][44]. - The ultimate goal is to enhance the digital economy while ensuring safety and fairness in AI applications, contributing to a more equitable society [44][45].
“零人”搞医学研究:清华AI智能体从灵感到论文全程自主
量子位· 2025-09-26 02:08
清华大学自动化系索津莉课题组 投稿 量子位 | 公众号 QbitAI 医学研究迎来"零人工"时代了?! 清华大学自动化系索津莉课题组,发布首个专为医疗信息学设计的全自主AI研究框架—— OpenLens AI 。 首次实现从文献挖掘→实验设计→数据分析→代码生成→可投稿论文的全链条自动化闭环。 为什么要推出该系统?主要是医疗信息学研究正陷入效率困局——多中心数据融合、知识爆炸、跨学科协作需求,使传统科研模式日益捉襟见 肘。 而OpenLens AI引入医学专属质量控制方法,生成出版级别的高质量科研论文,将科研周期从"月级"压缩至"小时级",宣告医学研究迎来"零人 工"时代。 下面详细来看—— 五大核心模块:AI科研的梦之队 OpenLens AI不仅实现全流程自动化,也在质量控制方面设立新标杆,集成四大保障机制: OpenLens AI采用模块化架构,由五个专门化的智能体协同工作,构建起完整的科研自动化流水线: 主管模块 作为全局协调者,将用户查询分解为结构化子任务,确保整个研究流程的透明度和可解释性。 文献综述者 构建自主知识探索管道,利用基于ReAct的推理框架,检索并综合相关文献,为研究提供坚实的理论基 ...
多模态推理最高加速3.2倍!华为诺亚新算法入选NeurIPS 2025
量子位· 2025-09-26 02:08
ViSpec团队 投稿 量子位 | 公众号 QbitAI 不牺牲任何生成质量,将多模态大模型推理最高加速3.2倍! 华为诺亚方舟实验室最新研究已入选NeurIPS 2025。 VLM用投机推理技术加速有限 大模型的多模态能力,正以前所未有的速度发展,但一个"老大难"问题也日益凸显: 推理速度 。 当模型需要一边"看图"一边"说话",尤其是在生成长篇图文并茂的回复时,计算成本和时间延迟会急剧增加,这极大地限制了VLM在实时交 互、边缘部署等场景下的应用。 为了让大模型"说"得更快,学术界和工业界普遍采用 投机推理 技术。它就像一个聪明的"军师" (小型的草稿模型) 和一个决断的"主公" (大型的目标模型) 。 截至目前,投机推理(Speculative Decoding)技术已成为大语言模型(LLM)推理加速的"标准动作",但在多模态大模型(VLM)上的应 用却举步维艰,现有方法加速比 不到1.5倍 ,性能提升有限。 为此,华为诺亚方舟实验室提出了一种专为视觉语言模型设计的全新推理加速框架—— 视觉感知投机推理(Vision-Aware Speculative Decoding, ViSpec) ,首次在该领域 ...
ChatGPT新功能,抢占你早上第一个打开的App
量子位· 2025-09-26 02:08
Core Insights - ChatGPT has introduced a new feature called ChatGPT Pulse, which allows for personalized updates without the need for user prompts, functioning as a proactive assistant [1][5][6] - The feature learns from user interactions and integrates with calendars and emails to provide tailored content, including daily briefings and suggestions [8][9][10] - Currently, this feature is available only to Pro users, indicating a potential strategy to enhance user engagement and subscription value [15] Group 1 - ChatGPT Pulse represents a shift from a reactive to a proactive AI assistant, capable of monitoring important tasks and providing timely information [5][6] - The system generates a personalized "core dynamic" briefing based on user data, which may include updates on events, vocabulary lessons, and meal suggestions [8][9] - User feedback on the Pulse experience is utilized solely to enhance individual user interactions, ensuring a customized experience [11][13] Group 2 - The feature is designed to avoid overwhelming users with constant notifications, focusing instead on efficient problem-solving [10] - ChatGPT Pulse can suggest activities and plans based on the user's schedule and preferences, demonstrating its utility as a personal assistant [13][14] - The introduction of this feature aligns with OpenAI's vision of creating intelligent agents that can operate alongside users, enhancing productivity and daily life [5][6]