量子位
Search documents
破解人机协作密码:工作技能拆成两层,AI执行人类决策成功率狂飙 | ICML 2025
量子位· 2025-08-27 05:49
Core Viewpoint - The paper presents a mathematical framework that decomposes work skills into two levels, highlighting the complementary strengths of humans and AI, which leads to a higher overall success rate when combined rather than working independently [2][4]. Group 1: Human-AI Collaboration - The research shifts the discussion from whether AI will replace humans to how work value is fundamentally reshaped, emphasizing that technology replaces or supplements specific tasks rather than entire jobs [6]. - The authors propose a new framework that breaks down work into skill units, further dividing each skill into decision-making and execution components [8][12]. - The case study of a software engineer illustrates that while AI tools like GitHub Copilot automate execution tasks, the engineer's value increases as their role shifts to supervision and decision-making [11][14]. Group 2: Mathematical Framework - The mathematical model quantifies the new division of labor, allowing for the assessment of job success probability based on the combination of human and AI capabilities [16][18]. - The model reveals a phase transition phenomenon in job success probability, indicating that small improvements in decision-making skills can lead to significant increases in success rates [18][21]. - The framework provides a tool for evaluating the match between worker capabilities and job requirements, moving beyond traditional performance metrics [26]. Group 3: Practical Implications - The research suggests a need to reshape skill development paths, focusing on enhancing decision-making abilities rather than merely executing tasks, as execution skills are more susceptible to AI advancements [27][28]. - Organizations should recruit for complementary skills rather than seeking all-rounders, allowing for the identification of individuals with high decision-making capabilities who may need support in execution [30][31]. - The framework emphasizes the importance of designing systems that recognize and enhance human judgment, as the AI wave separates execution from decision-making [32][33].
数字技术工人已到岗!时序大模型+Agent已掌握了工厂生产管控技术,比人类更懂工况
量子位· 2025-08-27 04:15
Core Viewpoint - The emergence of "digital technical workers" powered by AI and large models is transforming industrial operations, filling the gap left by the shortage of experienced human experts in complex production processes [1][2][3]. Group 1: Digital Technical Workers - Digital technical workers can quickly take on roles in industrial settings, having undergone extensive pre-training before deployment, allowing them to immediately integrate into production environments [3][4]. - These workers are capable of performing complex tasks such as dynamic ammonia synthesis, hydrogen production through electrolysis, and waste incineration power generation, which traditionally required expert human oversight [2][12]. Group 2: Capabilities of Digital Workers - The digital workers possess three core capabilities: perception (real-time data acquisition), cognition and decision-making (utilizing large models for management decisions), and execution (operating industrial software and hardware) [5][6][11]. - The platform developed by the company, known as "He Gu," enables these digital workers to function effectively in various roles, including equipment operators, process supervisors, energy managers, safety officers, and planners [11][12]. Group 3: Technological Innovations - The core technological advancement lies in the self-developed industrial time-series model, which processes time-dependent data, allowing for accurate predictions and decision-making in industrial contexts [19][20]. - The combination of time-series models and large language models enhances the digital workers' ability to learn from historical data and apply it to real-time operational decisions [21][26]. Group 4: Market Demand and Industry Trends - The industrial sector faces a significant shortage of skilled labor, particularly in traditional fields like chemical engineering, prompting a shift towards AI solutions to fill these gaps [43][44]. - Companies are increasingly recognizing the need for intelligent, autonomous solutions to address both labor shortages and safety concerns in high-risk environments [46][49]. Group 5: Business Model Innovations - The company offers two commercial cooperation models for deploying digital workers: a one-time purchase model and a pay-per-use model, allowing flexibility for enterprises [50][51]. - This innovative approach aims to free human workers from repetitive and hazardous tasks, enabling them to focus on more creative and high-value roles in the future [52].
阿里开源14B电影级视频模型!实测来了:免费可玩,单次生成时长可达分钟级
量子位· 2025-08-27 02:24
Core Viewpoint - The article highlights the launch of Alibaba's new AI video generation model, Wan2.2-S2V, which allows users to create high-quality digital human videos using just an image and an audio clip, marking a significant advancement in AI video technology [1][3]. Group 1: Model Features - Wan2.2-S2V boasts improved naturalness and fluidity in character movements, particularly in generating various cinematic scenarios [3]. - The model can generate videos in minutes, offering stability and consistency, along with cinema-level audio capabilities [5]. - It supports advanced action and environmental control based on user instructions [5]. Group 2: User Experience - The model has been well-received by users, with many sharing positive experiences and creative applications, such as generating animated characters reciting poetry [6][15]. - Users can access the model for free on the Tongyi Wanxiang website, where they can upload audio or choose from a voice library [2][11]. Group 3: Technical Innovations - Wan2.2-S2V utilizes a dataset of over 600,000 audio-video segments and employs mixed parallel training for full parameterization, enhancing model performance [19]. - The model integrates text-guided global motion control and audio-driven fine-grained local motion to achieve complex scene generation [19]. - It introduces AdaIN and CrossAttention mechanisms to synchronize audio and visuals effectively [20]. Group 4: Model Capabilities - The model can generate long videos by employing hierarchical frame compression, expanding the length of motion frames from several frames to 73 frames [21]. - It supports multi-resolution training, allowing for video generation in various formats, including vertical short videos and horizontal films [22]. - With the release of Wan2.2-S2V, Alibaba's Tongyi model family has surpassed 20 million downloads across open-source communities and third-party platforms [23].
马斯克星舰试验再创历史!3次爆炸3次推迟终成功,顺利完成太空载荷部署
量子位· 2025-08-27 02:24
Core Viewpoint - The successful tenth test flight of SpaceX's Starship marks a significant milestone, demonstrating advancements in technology and operational capabilities after previous setbacks [1][57]. Group 1: Launch Details - The Starship S37 successfully landed in the Indian Ocean after its launch on the evening of the 26th (Central Time) [1]. - The launch utilized the B16 engine combined with the S37 spacecraft, achieving all experimental objectives including booster return and payload deployment [10][5]. - The launch was initially scheduled for late June but faced multiple delays due to technical issues and weather conditions [45][48]. Group 2: Technical Achievements - The test flight included pressure testing of vulnerable areas by removing tiles from the spacecraft [10]. - The booster successfully completed its return burn, marking a critical step in the return process [21]. - The S37 spacecraft successfully released eight payloads, with each release celebrated by the SpaceX team [30][32]. Group 3: Historical Context - This successful flight follows three previous failures in 2023, where earlier tests resulted in explosions and loss of the spacecraft [53][54]. - The transition from hydraulic to electric thrust vector control systems represents a significant technological upgrade in the second-generation Starship [51]. - The new heat shield design, including actively cooled metal tiles, is being tested to enhance performance during re-entry [52].
DeepSeek“极你太美”bug,官方回应了
量子位· 2025-08-27 02:24
Core Viewpoint - The article discusses a significant bug in the DeepSeek V3.1 model, which has caused widespread concern among developers due to the unexpected appearance of the character "极" in generated code outputs, leading to potential compilation failures and issues in high-precision tasks [1][2][11]. Summary by Sections Bug Discovery and Impact - Developers have reported that during API calls for code development, the output occasionally includes the character "极", which can disrupt the coding process [2][5]. - The issue was first identified on platforms like Volcano Engine and Chutes, but it has since affected other platforms, including Tencent's CodeBuddy and DeepSeek's official channels [5]. Community Response and Solutions - The community has pointed fingers at the DeepSeek V3.1 model for the bug, and CodeBuddy has reached out to DeepSeek for a fix in an upcoming version [12]. - Users have begun sharing tips to mitigate the "极" bug, such as using specific prompt patterns to avoid triggering the issue [14][18]. Analysis of the Bug's Origin - A user on Zhihu, Huang Zhewai, suggested that this bug is not an isolated incident and may relate to a "malicious pattern" in large model programming [21]. - Huang observed that similar issues occurred in earlier models, where the output would unexpectedly include terms like "极长" after a series of repetitions, indicating a potential flaw in the model's reasoning process [21][22]. - He hypothesized that the root cause might be inadequate data cleaning during the supervised fine-tuning (SFT) phase, leading to the model learning to use "极" as a termination marker [22]. Future Outlook - The resolution of the "极" bug is contingent upon the release of a new version from DeepSeek, which is expected to address the underlying issues [24].
英伟达韩松团队新作:具有后神经架构搜索的高效语言模型
量子位· 2025-08-26 08:11
时令 发自 凹非寺 量子位 | 公众号 QbitAI 英伟达开源又放大招了! 韩松团队 推出了一款全新的基于后神经架构搜索的高效语言模型—— Jet-Nemotron 。 该模型在一系列基准测试中,不仅表现出与Qwen3、Qwen2.5、Gemma 3和Llama 3.2相当甚至更优的准确率,还在生成吞吐量上实现最高 53.6倍加速,在预填充阶段达到6.1倍加速。 值得一提的是,在MMLU、MMLU-Pro和BBH基准上,Jet-Nemotron-2B相比Qwen3-1.7B-Base吞吐量提高了47倍,缓存大小缩小至1/47。 同时,它还实现了比DeepSeek-V3-Small和Moonlight (共150亿参数,22亿激活参数) 更高的准确率。 代码和预训练模型都将开源,我们先来看看Jet-Nemotron是如何构建的。 Jet-Nemotron:基于后神经架构搜索构建 首先,Jet-Nemotron是在 后神经架构搜索 (Post Neural Architecture Search,PostNAS)的基础上构建的。 其中,后神经架构搜索(PostNAS)模型是一种"站在大模型肩膀上做改造"的架构搜 ...
GPT-5通关《宝可梦水晶》创纪录!9517步击败赤爷,效率碾压o3三倍!
量子位· 2025-08-26 08:11
Core Viewpoint - GPT-5 has demonstrated exceptional performance in completing the game "Pokémon Crystal," defeating the final boss, Red, in significantly fewer steps compared to its predecessor, o3, showcasing advancements in AI capabilities and efficiency in gaming [1][3][21]. Summary by Sections Performance Comparison - GPT-5 completed "Pokémon Crystal" in just 9,517 steps, while o3 took 27,040 steps, indicating that GPT-5 was nearly three times more efficient [3][4]. - The average human player typically takes around 5 days (approximately 40 hours) to complete the game [5]. - In the main storyline, GPT-5 used only 9,205 steps to collect all 16 badges, compared to o3's 22,334 steps [10]. Efficiency in Gameplay - From badge collection to defeating Red, GPT-5 required only 312 steps, while o3 needed nearly 5,000 steps, demonstrating a speed increase of several times [11]. - During the Elite Four and Champion battles, GPT-5 used 7,329 steps, while o3 used over 18,115 steps, again highlighting GPT-5's superior efficiency [14]. AI Model Capabilities - The success of GPT-5 is attributed to its reduced "hallucination" rate, better spatial reasoning, and improved goal planning compared to o3 [21]. - GPT-5's ability to plan longer action sequences with minimal errors has significantly saved time during gameplay [21]. Benchmarking AI Models - The article discusses the trend of AI models, including Google's Gemini and Anthropic's Claude, attempting to play Pokémon games, with varying degrees of success [23][24]. - Pokémon games serve as a benchmark for evaluating AI models' contextual understanding, decision-making, and interface control capabilities [29]. Cost of AI Gaming - The cost of using GPT-5 for gaming is substantial, with estimates suggesting that completing "Pokémon Red" (which is half the length of "Pokémon Crystal") could cost around $3,500 [30]. - The article notes that unless one works at OpenAI, the financial barrier to using Pokémon as a benchmark for AI testing is significant [31].
阿里老兵造出会说话的迪迦!AI玩具单品20万销量,红杉等2亿A轮抢投
量子位· 2025-08-26 08:11
Core Viewpoint - The article discusses the launch of the world's first AI toy based on the Ultraman franchise, highlighting the innovative approach of the company Havivi in utilizing AIGC technology to create interactive plush toys that provide emotional value rather than just functional value [2][8][50]. Group 1: Product Launch and Features - The company Havivi has released the CocoMate Ultraman toy, which is the second product following their first AI toy, BubblePal [8][11]. - The new toy features significant improvements, including a core AI module called CocoMate, which allows for interactive voice responses and character-specific dialogues [20][21][43]. - The product has undergone extensive user feedback iterations, addressing issues such as battery life and connectivity limitations present in the first generation [25][26]. Group 2: Funding and Market Strategy - Havivi has completed a Series A financing round, raising 200 million yuan, with participation from notable investors like CICC Capital and Sequoia China [8]. - The company aims to leverage China's strengths in smart manufacturing while focusing on the emotional connection between AI toys and users [7][50]. Group 3: Technological Innovations - The CocoMate toy incorporates advanced features such as a 4G SIM card for connectivity, a larger battery capacity of 3000 mAh, and a NFC card system for unlocking different storylines [33][34]. - The AI toy utilizes a unique end-to-end voice model, enhancing response speed and interaction quality [27][45]. Group 4: Future Directions - Havivi plans to expand its product lines to include both children's toys and adult-oriented emotional companion devices, with upcoming products expected to feature popular IPs [58][62]. - The company emphasizes the importance of emotional memory in AI interactions, aiming to create lasting connections with users [56][65].
大模型开发生态还有哪些新机遇?9月13日来外滩找答案 | 报名开启
量子位· 2025-08-26 05:46
Core Viewpoint - The forum titled "AI Open Source Era: Building Global Ecosystem and Sustainable Growth" will explore the core logic of the AI open-source ecosystem through various perspectives, highlighting the trends and practices in the field [1][5]. Group 1: Forum Overview - The forum will feature three keynote speeches that will analyze the global large model open-source ecosystem, community practices, and the competitive landscape of open-source models [1][2]. - Keynote speakers include Wang Xu from Ant Group, Chen Yingda from Modao Community, and Yang Pan from Silicon-based Flow, each providing insights into different aspects of the AI open-source landscape [1][6][10]. Group 2: Keynote Topics - Wang Xu will discuss the panoramic view and trends of the global large model open-source ecosystem, using community data as a reference for technical decision-making [1][6]. - Chen Yingda will share the construction experience behind over 90,000 quality models and how the "Model as a Service" (MaaS) concept drives the evolution of the open-source ecosystem [1][8]. - Yang Pan will analyze the competitive and collaborative dynamics of the global open-source model ecosystem, focusing on the transition from belief to confidence in technology [1][9]. Group 3: Roundtable Discussions - Following the keynotes, two roundtable discussions will focus on Vibe Coding and AI Agents, addressing real-world applications, potential issues, and future possibilities in human-machine collaboration [2][11]. - The discussions will feature practitioners and entrepreneurs from various organizations, including Ant Group and ByteDance, who will provide multi-dimensional insights into the evolution of AI coding products and the path towards AGI [2][13][15]. Group 4: Event Logistics - The forum will take place at the C2 Hall of the Expo Garden in Huangpu District, Shanghai, with a limited capacity of 350 professional audience seats [2][5]. - Registration for professional attendees is now open, inviting participants to engage in discussions and capture technological opportunities [2].
榨干GPU性能,中兴Mariana(马里亚纳)突破显存壁垒
量子位· 2025-08-26 05:46
Nvidia开源的Dynamo项目,实现存储系统多级缓存算法,热数据在显存、温数据在主机内存、冷数据在 SSD 或远端对象存储,并通过一套 统一的索引 + 异步流水线实现自动迁移与透明访问,但是多级存储之间的数据迁移流程复杂,延迟开销难以压缩。 微软推出的LMCahce存储系统,高度兼容vLLM等推理框架,但是对分布式存储支持较低,空间上限低。 阿里巴巴提出一种将KV Cache空间扩展到Tair数据库的远端存储方案,存储空间易扩展,但是读写性能难以满足LLM推理业务的低延迟需 求。 CXL(Compute Express Link) 作为一种新兴的高速互联技术,以其高带宽、低延迟和硬件级缓存一致性的特性,为破解内存瓶颈带来了 新的希望,可以解决AI和高性能计算中遇到的内存瓶颈问题。 业界关于CXL存储加速LLM推理的研究仍然较少,探索如何利用CXL等新型介质扩展KV Cache空间,进而将成熟的软件栈迁移到CXL硬件场 景,是一项非常有意义的工作。 当大语言模型(LLM)走向千行百业,推理效率与显存成本的矛盾日益尖锐。 KV Cache (Key-Value Cache)作为提升生成速度的核心技术,却像一个 ...