Workflow
量子位
icon
Search documents
OpenAI奥特曼:能被ChatGPT消灭的工作不是真正的工作
量子位· 2025-10-13 08:47
Core Insights - The discussion highlights the evolving role of AI in the workplace, suggesting that many current jobs may not represent "real work" as AI capabilities advance [30] - The conversation also touches on the development of GPT-6 and the potential for AI to achieve AGI (Artificial General Intelligence) [18][19] Group 1: AI Development and Applications - Sam Altman expresses excitement about the integration of applications into ChatGPT, emphasizing the potential for developers to create innovative solutions using the Agent Builder and Agent Kit [5][6] - The conversation indicates that ChatGPT has reached 800 million weekly active users, positioning it as a new distribution platform for developers [5] - Altman notes significant advancements in model capabilities over the past two years, allowing for easier and more complex system development with minimal coding [7][8] Group 2: Future of Work and AI Impact - The dialogue suggests that the number of software applications created will increase dramatically, and the time required for testing and refining ideas will decrease significantly [9] - Altman predicts that the first billion-dollar company operated by agents is still a few years away, but the technology is progressing rapidly [11][12] - The concept of "workslop," where AI-generated content requires additional human editing, is discussed, highlighting the need for education on effective AI usage [21][22] Group 3: AGI and Its Implications - Altman defines AGI as AI surpassing human capabilities in high-value economic tasks, noting that current AI can make novel discoveries, albeit on a small scale [19][20] - The conversation emphasizes the importance of recognizing both the potential and limitations of AI advancements, with a focus on gradual progress towards AGI [18][19] Group 4: AI in Communication and Interaction - Altman argues that voice may not be the ultimate form of interaction with AI, suggesting that various modes of communication will coexist [39][40] - The potential for real-time video interactions is highlighted as a valuable path towards achieving AGI [26] Group 5: Business Models and Future Directions - The discussion includes thoughts on potential revenue models for new applications like Sora, with considerations for user engagement and monetization strategies [27][28] - Altman expresses optimism about the future of AI and its ability to create new opportunities, while also acknowledging the need for a global framework to manage risks associated with powerful AI models [33]
Sora2“复活”已故名人,家属强烈反对
量子位· 2025-10-13 08:47
Core Viewpoint - The rapid rise of Sora 2 has brought the issue of portrait rights back into focus, particularly concerning the use of deceased celebrities' images for AI-generated content [1][18]. Group 1: Reactions from Family Members - Family members of deceased celebrities, such as Robin Williams' daughter, have expressed strong discontent regarding AI-generated videos that utilize their loved ones' likenesses, stating it is disrespectful and painful [4][20]. - Zelda Williams has publicly requested that people stop sending her AI videos of her father, emphasizing that such actions are not what he would have wanted [5][6][20]. - Similar sentiments have been echoed by other family members of deceased public figures, indicating a broader concern about the use of AI in this context [24]. Group 2: Legal and Ethical Considerations - There is a growing consensus that the portrait rights of deceased celebrities should be inherited by their relatives or relevant organizations, highlighting the need for updated copyright laws in light of rapid AI advancements [8][10]. - OpenAI has acknowledged the importance of free speech in depicting historical figures but asserts that public figures and their families should ultimately control how their likenesses are used [25][26]. - The American Film Association has reported a surge in copyright infringement related to the use of members' works since the launch of Sora 2, indicating a pressing need for stronger copyright protections [27][28]. Group 3: Future Implications - The ongoing debate surrounding Sora 2's copyright issues raises questions about the future of AI-generated content and the rights of creators and their estates [29][30].
刚得诺奖的成果被做成芯片了
量子位· 2025-10-13 03:35
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 谁说获得诺贝尔化学奖的 MOF (金属有机框架) "无用"? 这种几十年前被嫌弃"只有理论但缺乏实际应用"的新材料, 前脚刚获得诺奖认可,后脚就被做成芯片 ! (诺奖组委会这前瞻性666) 这就是莫纳什大学的科学家们刚刚发布的最新成果——用MOF制造超迷你的流体芯片。 不同于传统芯片,不仅可以完成常规计算,还能记住之前的电压变化,形成 类似大脑神经元 的短期记忆。 正如作者所说,也许这将是 新一代计算机 的范例: 如果我们能够设计出像MOF这样只有几纳米厚的功能性材料,我们就可以制造出先进的流体芯片,以补充甚至克服当今电子芯片的一些 局限性。 具有"类脑"记忆通路的纳米流体芯片 纳米约束条件下的离子选择性传输正在生物机制仿真、离子分离、离子电子器件等方面展现出潜力,但由于难以制备高精度纳米通道器件,要 想实现可调非线性的离子运输其实相当困难。 而用 MOF 材料制作出的纳米流体芯片则解决了这一点。 MOF本身具备明确的通道结构,而且适配多种化学成分,可以在分子和离子传输过程中完成原子级精度调节。 研究人员基于此,构建了一种分层纳米流体晶体管器件 h-MOF ...
Meta「分割一切」3.0曝光!技能语义分割加入概念提示,好好玩,要爆了
量子位· 2025-10-13 03:35
Core Viewpoint - The article discusses the introduction of SAM 3, a third-generation segmentation model that enhances interactive segmentation capabilities by understanding natural language prompts, allowing for more intuitive and flexible image and video segmentation tasks [3][6][10]. Group 1: Model Features - SAM 3 introduces a new task paradigm called Promptable Concept Segmentation (PCS), enabling the model to segment instances in images or videos based on phrases or image examples [11][12]. - The model supports open vocabulary, allowing users to input any noun phrase as a segmentation target, and can maintain identity consistency across video frames [17]. - SAM 3's architecture includes a Presence Head module that decouples object recognition and localization tasks, improving performance in multi-instance segmentation [16][17]. Group 2: Data Engine and Benchmark - A scalable data engine was built to enhance PCS, generating a training dataset with 4 million unique concept labels and 52 million verified masks [19]. - The SA-Co benchmark was introduced to evaluate the model's performance in open vocabulary segmentation tasks, containing 214,000 unique concepts and covering 50 times more than existing benchmarks [23][24]. Group 3: Performance Metrics - SAM 3 achieved a 47.0% accuracy in zero-shot segmentation tasks on the LVIS dataset, surpassing the previous state-of-the-art (SOTA) of 38.5% [28]. - In the new SA-Co benchmark, SAM 3's performance was at least twice as strong as baseline methods [29]. - The model demonstrated superior performance in video segmentation tasks compared to its predecessor, SAM 2 [30]. Group 4: Real-time Processing - SAM 3 can process images with over 100 entities in approximately 30 milliseconds on H200 GPUs, maintaining near real-time performance for about five concurrent targets in video tasks [35]. Group 5: Limitations - The model struggles to generalize its capabilities to specialized fields such as medical imaging and thermal imaging through zero-shot learning [36]. - In multi-target scenarios during video segmentation tasks, the model's real-time performance may decline, necessitating multi-GPU parallel processing [37].
“AI版LeCun”自己讲解论文,自我进化智能体框架生成精美演讲视频
量子位· 2025-10-13 01:35
EvoPresent团队 投稿 量子位 | 公众号 QbitAI AI自己讲明白论文,还能生成更美观的幻灯片。 加州大学圣塔芭芭拉(UCSB)与圣克鲁兹(UCSC)的研究者提出 EvoPresent ,一个能够自我进化的学术演讲智能体框架,让AI不仅 能"讲清楚论文",还能"讲得好看"。 从逻辑到审美:科研演讲自动化的瓶颈 尽管已有很多系统能将论文自动转化为幻灯片或海报,但它们仍存在三大局限: 叙事单一、设计僵化、缺乏反馈。 AI往往沿用论文结构机械提炼内容,讲述缺乏起伏;模板化设计又难适配不同风格,常出现色彩冲突、排版拥挤等问题;生成过程一旦结束, 系统便无法判断"哪里不美",更谈不上自我修正。 这些不足让AI演讲显得冷漠机械,难以兼顾逻辑与美感。 EvoPresent 正是在此提出新的路径,让AI像人类讲者一样,在生成中反思,在反思中进化。 研究者采用了Group Relative Policy Optimization (GRPO)算法,通过人类偏好数据训练模型,使其能在反馈中逐步形成可解释的审美推 理。与传统监督学习不同,这种方式让模型不仅会"打分",还能说明原因,如"标题层级不清晰""文字与图像间距 ...
马斯克从英伟达挖人做AI游戏!第一步:研发世界模型
量子位· 2025-10-13 01:35
Core Viewpoint - xAI, founded by Elon Musk, is entering the competitive field of world models, aiming to leverage expertise from Nvidia to enhance its capabilities in AI-generated gaming by 2026 [1][2][7]. Group 1: xAI's Entry into World Models - xAI has recruited several senior researchers from Nvidia to strengthen its position in the world model arena, which has become a battleground for major AI companies [1][7]. - The first step for xAI involves hiring researchers like Zeeshan Patel and Ethan He, who have significant experience in deep learning and generative models [9][10][18]. - Both researchers previously contributed to Nvidia's Omniverse platform, which is a leading simulation platform that aligns well with the requirements of world model training [21][22][25]. Group 2: Objectives and Applications - The concept of world models allows AI to simulate environments internally, which is seen as a foundational element for achieving Artificial General Intelligence (AGI) [26][27]. - xAI's initial focus within the world model framework is likely to be on video games, aiming to create AI that can generate adaptive and realistic 3D environments based on player interactions [33][34]. - The recruitment of a multimodal team indicates xAI's commitment to integrating various forms of media, such as audio and video, into its AI systems [37][40]. Group 3: Strategic Vision - Musk has articulated that xAI's mission is to enable AI to understand the essence of the universe, with world models being a critical pathway to this understanding [41][42]. - The interconnectedness of xAI, Tesla, and Neuralink suggests a strategic vision where data and insights from these entities could create a comprehensive AI ecosystem [44][45].
通用模型无法完全理解用户,AI产品的下一站是上下文的战场|对话AI知识助手remio
量子位· 2025-10-12 07:30
Core Insights - The article discusses the evolution of AI productivity tools, emphasizing the importance of deep user understanding and personalized experiences in enhancing user engagement and satisfaction [3][4]. - The AI knowledge assistant remio is highlighted as a product that aims to function as a "second brain," providing seamless and automated information management for users [5][11]. Group 1: Product Features and Differentiation - Remio's core functionalities include automatic information capture, intelligent knowledge management, and AI-assisted content creation, all designed to streamline knowledge workflows [11][12]. - The product prioritizes user privacy by storing all data locally on the user's device, eliminating concerns associated with cloud storage [12][23]. - Remio's unique selling proposition lies in its ability to understand and synchronize with the user's information needs, making it particularly effective for complex knowledge workers [19][21]. Group 2: Target Market and User Segmentation - The target audience for remio consists of complex knowledge workers, including managers, engineers, and consultants, with an estimated global market size of several hundred million [25][27]. - Simple knowledge workers, such as customer service representatives, are expected to be replaced by AI agents, highlighting the growing demand for tools that cater to more sophisticated tasks [25][26]. Group 3: User Engagement and Activation - New user activation is identified as a critical metric for remio, with efforts focused on educating users about the long-term benefits of investing time in the product [31][33]. - The company is exploring various strategies to enhance user onboarding, including the introduction of a "prompt repository" to showcase the potential of the product [34][35]. Group 4: Competitive Landscape and Barriers to Entry - Remio's approach to data storage and user privacy creates a significant barrier to entry against larger internet companies that typically rely on cloud-based data models [50][52]. - The product's focus on local data storage and understanding the value of personal data differentiates it from traditional collaborative platforms [50][52]. Group 5: Future Development and Iteration - Short-term development priorities include integrating comprehensive office data and expanding compatibility to Windows users, aiming to enhance the product's utility [54][55]. - The company acknowledges the need for continuous user feedback and iterative improvements to meet evolving user needs and expectations [39][49].
抖音&LV-NUS开源多模态新模,以小博大刷新SOTA,8B推理比肩GPT-4o
量子位· 2025-10-12 07:30
SAIL-VL2团队 投稿 量子位 | 公众号 QbitAI 2B模型在多个基准位列4B参数以下开源第一。 抖音SAIL团队与LV-NUS Lab联合推出的多模态大模型 SAIL-VL2 。 SAIL-VL2 以2B、8B等中小参数规模, 在 10 6个数据集 实现性能突破 ,尤其在MMMU、MathVista等 复杂推理 基准超越同规模模型,甚 至比肩更大参数的闭源模型。 方法上,SAIL-VL2通过 数据、训练、架构 三大维度的创新,为社区提供"小模型也能有强能力"新范式。 SAIL-VL2既具备细粒度视觉感知能力,又能在复杂推理任务中媲美更大规模模型。同时,团队通过开源模型与推理代码,提供可扩展的多模 态基础模型。 Pretrain:三大核心创新 MoE架构:参数与计算的平衡 架构层面:稀疏MoE+灵活编码器,平衡性能与效率 SAIL-VL2突破传统稠密LLM的架构,引入稀疏混合专家 (MoE) ,并提供多规格模型配置,满足不同场景需求: | Model | Vision Encoder Language Model | #Param | | | --- | --- | --- | --- | | | ...
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-12 04:07
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行的 企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现并 致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 2025 人工智能年度潜力创业公司 聚焦于中国人工智能领域创新创业力量,将评选出最具投资价值和发展潜力的AI创业公司, 参选条件 : 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决 ...
Hinton暴论:AI已经有意识,它自己不知道而已
量子位· 2025-10-12 04:07
Core Viewpoint - The article discusses Geoffrey Hinton's perspective on artificial intelligence (AI), suggesting that AI may already possess a form of "subjective experience" or consciousness, albeit unrecognized by itself [1][56]. Group 1: AI Consciousness and Understanding - Hinton posits that AI might have a nascent form of consciousness, which is misunderstood by humans [2][3]. - He emphasizes that AI has evolved from keyword-based search systems to tools that can understand human intentions [10][14]. - Modern large language models (LLMs) exhibit capabilities that are close to human expertise in various subjects [15]. Group 2: Neural Networks and Learning Mechanisms - Hinton explains the distinction between machine learning and neural networks, with the latter inspired by the human brain's functioning [17][21]. - He describes how neural networks learn by adjusting the strength of connections between neurons, similar to how the brain operates [21][20]. - The breakthrough of backpropagation in 1986 allowed for efficient training of neural networks, significantly enhancing their capabilities [38][40]. Group 3: Language Models and Cognitive Processes - Hinton elaborates on how LLMs process language, drawing parallels to human cognitive processes [46][47]. - He asserts that LLMs do not merely memorize but engage in a predictive process that resembles human thought [48][49]. - The training of LLMs involves a cycle of prediction and correction, enabling them to learn semantic understanding [49][55]. Group 4: AI Risks and Ethical Considerations - Hinton highlights potential risks associated with AI, including misuse for generating false information and societal instability [68][70]. - He stresses the importance of regulatory measures to mitigate these risks and ensure AI aligns with human interests [72][75]. - Hinton warns that the most significant threat from advanced AI may not be rebellion but rather its ability to persuade humans [66]. Group 5: Global AI Landscape and Competition - Hinton comments on the AI competition between the U.S. and China, noting that while the U.S. currently leads, its advantage is diminishing due to reduced funding for foundational research [78][80]. - He acknowledges China's proactive approach in fostering AI startups, which may lead to significant advancements in the field [82].