Workflow
机器之心
icon
Search documents
上海创智学院菁智人才论坛 | 海内外顶尖青年人才召集令暨海优政策宣讲会
机器之心· 2025-12-17 02:05
Core Viewpoint - Shanghai Chuangzhi Academy aims to create an innovative ecosystem that encourages value creation and provides abundant resources for talent in the field of artificial intelligence [3][10]. Group 1: Event Overview - The "Super MVP" Talent Forum is scheduled for December 26-27, 2025, and late January 2026, combining online and offline formats [5]. - The forum invites top global young talents from leading universities and research institutions in AI-related fields [6]. Group 2: Talent Requirements - The academy seeks PhD candidates or recent graduates in AI-related disciplines such as computer science, mathematics, and physics from top universities [6]. - Candidates should have formal teaching or research positions at prestigious institutions or be engaged in R&D roles at leading companies or startups [6]. Group 3: Institutional Background - Established in July 2024, Shanghai Chuangzhi Academy is a collaborative initiative between the Ministry of Education and Shanghai to explore high-level talent cultivation [10]. - The academy focuses on a student-centered approach and aims to become a hub for AI innovation [10]. Group 4: Support and Resources - The academy offers substantial computational resources, collaboration with top university mentors, and a strong engineering team [18]. - It emphasizes a flat organizational structure that promotes collaboration among independent principal investigators, students, and industry mentors [18]. Group 5: Compensation and Benefits - Starting salaries for positions at the academy are set at a million-level, with additional benefits including rental discounts for furnished housing and access to top educational and medical resources in Shanghai [20].
浙大联手字节:开源大规模指令跟随视频编辑数据集OpenVE-3M
机器之心· 2025-12-17 00:00
本文的作者分别来自浙江大学和字节跳动。第一作者何昊阳是来自浙江大学的博士生,研究方向聚焦于视频生成与编辑。通讯作者为浙江大学谢磊教授。 亮点总结 论文标题: OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing 1. 作者提出了一个大规模、高质量、多类别的指令跟随的视频编辑数据集 OpenVE-3M,共包含 3M 样本对,分为空间对齐和非空间对齐 2 大类别共 8 小类 别。 2. 作者提出了稳定的高质量、多类别的指令跟随视频编辑数据构造管线,确保编辑质量的同时具有多样性,促进社区研究。 3. 作者提出了一个高效且有效的指令跟随视频编辑模型 OpenVE-Edit,仅 5B 的参数量实现了 SoTA 并超过了现有开源 14B 模型效果。 4. 作者提出了一个通用的、多类别且充满挑战的指令跟随视频编辑评测集,它从 3 个关键维度评估模型在各个类别上的性能并与人类评价高度对齐。 1. 研究动机 现有指令遵循的视频编辑数据集如 InsViE-1M、Senorita-2M、Ditto-1M 主要存在数据集规 ...
刚刚,OpenAI推出全新ChatGPT Images,奥特曼亮出腹肌搞宣传
机器之心· 2025-12-17 00:00
Core Viewpoint - OpenAI has launched a new version of ChatGPT Images, enhancing image generation and editing capabilities, aiming to simplify user interaction and broaden accessibility in creative processes [10][34][44]. Group 1: New Features and Improvements - The new ChatGPT Images is powered by OpenAI's flagship image generation model, offering precise editing while maintaining key details, with a fourfold increase in image generation speed [10][11]. - The model excels in various editing types, including adding, removing, combining, and replacing elements, allowing for detailed transformations while preserving important aspects of the original image [12][15]. - Enhanced instruction adherence enables the model to follow user commands more reliably, resulting in more accurate edits and better handling of complex compositions [24]. Group 2: User Experience and Accessibility - The updated Images feature is designed to make the image generation experience more enjoyable and effortless, with numerous preset filters and prompts to inspire creativity [34][44]. - The new model is available to all ChatGPT users and offers a 20% reduction in image input and output costs compared to the previous version, allowing for more image generation within the same budget [37]. - OpenAI aims to lower the psychological barrier for users by introducing an independent "Images" entry point and simplifying the interaction process, making it as easy as posting on social media [44]. Group 3: Competitive Landscape - The release of ChatGPT Images signifies a shift in the competitive landscape of AI image generation, moving from a focus on model capabilities to a comprehensive product experience [43]. - OpenAI has not released quantitative benchmark results for this update, indicating a strategic emphasis on user experience rather than purely technical performance metrics [43].
PPO-Clip的「盲点」被补齐了?快手提出熵比裁剪方法,从局部约束到全局稳定的关键一跃
机器之心· 2025-12-16 10:22
本研究由快手科技语言大模型团队完成,核心作者苏振鹏,潘雷宇等。快手语言大模型团队聚焦在基础语言大模型研发、Agent RL 等前沿技术创新等方向,积累务实的探索 AGI 的能力边界,并不断推进 AI 领域新技术和新产品的发展。此前,该团队已 开源了 Klear-46B-A2.5B 和 Klear-Reasoner-8B 等模型,其中 Klear-Reasoner-8B 在数学和代码的基准测试上达到了同参数级别 模型的 SOTA 效果。 在大语言模型的后训练阶段,强化学习已成为提升模型能力和对齐质量的核心范式。然而,在广泛采用的 off-policy 的训练范式 中,更新当前策略的数据由旧的行为策略生成,导致分布漂移的问题的发生,这通常会将策略推至信任域之外,使强化学习的 训练变得不稳定。 尽管 PPO 通过重要性采样的裁剪机制缓解了部分问题,但它仅能约束已采样动作的概率变化,忽略了未采样动作的全局分布漂 移。为了应对这些挑战,快手研究团队提出了一种创新的熵比裁剪方法。该方法从全新的视角切入,通过约束策略熵的相对变 化来稳定全局分布,为强化学习训练提供了更加可靠的控制手段。 研究背景 强化学习训练过程中长期面临 ...
无问芯穹首曝智能体服务平台,以基础设施加速企业级「智能体自由」
机器之心· 2025-12-16 10:22
Core Viewpoint - The future of enterprises will be characterized by the integration of multiple intelligent agents, significantly amplifying organizational creativity and impact, as stated by the CEO of Wunwen Qinqun [1] Group 1: Intelligent Agent Ecosystem - The Wunwen Qinqun Intelligent Agent Service Platform was officially launched to provide comprehensive support for enterprises in the intelligent agent era, from customization to commercialization [3] - The platform aims to bridge the gap between infrastructure and intelligent agent development needs, addressing key challenges such as achieving production-level effectiveness and controlling costs [7][12] Group 2: Core Competitiveness in the Intelligent Era - The transition to the intelligent agent era accelerates the scaling of enterprise creativity, compressing the timeline from idea to industry [5] - The platform offers ready-to-use agent capability templates and reliable hosting services, enhancing the effectiveness and stability of intelligent agent operations [9] Group 3: Cost Control and Efficiency - The platform integrates deeply with underlying infrastructure to help enterprises flexibly control the costs associated with deploying intelligent agents, achieving efficiency improvements of 3 to 5 times compared to traditional service models [14] - It supports the integration of various tools, reducing over 70% of redundant labor in agent tool integration [16] Group 4: Real-World Applications and Impact - The platform has been validated through collaborations with industry partners, exemplified by the development of the "SysCoding Agent" for enterprise system development, which achieved over 95% completeness in its initial output [19][21] - The intelligent agent service model is being applied across various industries, providing efficient and agile services that translate industry knowledge into long-term business value [23] Group 5: Future Vision - Wunwen Qinqun aims to be a long-term partner for enterprises in the intelligent agent transformation process, focusing on converting organizational knowledge into sustainable value and defining the next generation of production paradigms [25] - The company emphasizes the importance of collaboration between academia and industry to create a closed loop of innovation and industry development [27]
英伟达成开源新王?Nemotron 3全新混合专家架构,推理效率升4倍
机器之心· 2025-12-16 08:55
机器之心编辑部 英伟达的自研大模型,刚刚有了大版本的更新。 北京时间今天凌晨,英伟达发布了 Nemotron 3 系列开放模型,共三种规模,分别为 Nano、Super 和 Ultra : 英伟达认为,随着企业从单一模型聊天机器人转向协同工作的多智能体 AI 系统,开发者正面临通信开销高、上下文漂移以及推理成本居高不下等挑战。同时,能 够支撑复杂工作流自动化的模型,必须具备足够的透明性与可解释性,才能赢得开发者与企业的信任。 其中 Nemotron 3 Nano 已在 Hugging Face 上线,是目前计算成本效率最高的模型,针对软件调试、内容摘要、AI 助手工作流和信息检索等任务进行了优化,可显 著降低推理成本。该模型采用独特的混合 MoE 架构,在效率与可扩展性方面实现了显著提升。 Nemotron 3 Nano 的总参数规模为 316 亿,激活参数规模为 32 亿(包含嵌入层为 36 亿)。在每次前向推理过程中,其激活的参数数量不到上代 Nemotron 2 Nano 的一半,却实现了更高的准确率。 与 Nemotron 2 Nano 相比,Nemotron 3 Nano 实现了最高 4 倍的 To ...
所有大模型,都学物理学:北大物理系一篇研究,震撼了AI圈
机器之心· 2025-12-16 08:55
编辑|+0、泽南、Panda LLM 智能体很赞,正在成为一种解决复杂难题的强大范式。 论文标题:Detailed balance in large language model-driven agents 论文地址:https://arxiv.org/pdf/2512.10047 简单来说,他们通过实验测量了 LLM 生成状态之间的转移概率。基于此,他们在统计上发现了 LLM 生成转移中的细致平衡 (detailed balance) 现象。 这表明: LLM 的生成可能不是通过一般性地学习规则集和策略来实现的,而是通过隐式地学习一类潜在的势函数 (potential functions),这些势函数可能超越了不 同的 LLM 架构和提示词模板。 不过,这种成功目前更多还停留在「经验主义」的工程实践层面 —— 我们知道它好用,但往往不知道它在宏观上为何如此运作。那么,我们是否能找到一个理论 框架,像物理学描述自然界那样,去理解和统一智能体的宏观动力学(macroscopic dynamics)? 为了解开这个黑盒,近日,北京大学物理学院、高能物理研究中心以及北京计算科学研究中心联合发力,跨界借用了物理学中经 ...
临床PK完胜ChatGPT-5!国内团队造出首个OCT影像AI系统
机器之心· 2025-12-16 04:11
机器之心发布 通用大模型(LLM)的狂飙突进,终于在医疗垂直领域的「最后一公里」撞上了硬墙。虽然 ChatGPT 在 USMLE(美国执业医师资格考试)中表现优异,但在面 对需要「火眼金睛」和「毫厘必争」的心脏手术台上,通用大模型的表现究竟如何? 近日,一项由空军军医大学唐都医院李妍教授团队牵头,与深圳清华大学研究院朱锐团队联合完成的 COMPARE 研究在 arXivs 上发表预印版。研究揭示:在经皮 冠状动脉介入治疗(PCI)的决策制定中, CA-GPT垂直领域 CA-GPT 系统 (一项基于 OCT 影像的 AI 系统), 在关键决策指标上显著优于 Open AI 的通用大模 型 ChatGPT-5 。该研究是基于中科微光医疗(Vivolight Medtech)OCT 系统搭建的 RAG 增强型 AI-OCT 整合决策支持模型。 这不仅是一次算法的胜利,某种程度上可以称得上是中国腔内影像领域的「DeepSeek 时刻」。这套 CA-GPT 系统有望重新定义心脏介入手术的智能化标准。 01. 巅峰对决 通用大模型在专业战场「水土不服」 据《2023 年全球心血管疾病负担报告》统计,每年因心血管疾病死亡的 ...
56倍加速生成式策略:西交大提出EfficientFlow,迈向高效具身智能
机器之心· 2025-12-16 04:11
Core Insights - The article discusses the development of a new generative policy learning method called EfficientFlow, which addresses two major challenges in embodied AI: reliance on large-scale demonstration data and slow inference times [2][3]. Group 1: Technical Highlights - EfficientFlow integrates equivariant modeling with efficient flow matching, significantly improving data efficiency and reducing the number of iterations required for inference, achieving state-of-the-art (SOTA) performance across multiple robotic operation benchmarks [2][19]. - The method introduces an acceleration regularization term in the loss function to encourage smoother and faster trajectory generation, inspired by physical intuition that smooth movements typically have low acceleration [6][19]. - The model employs equivariant networks that allow it to generalize learned actions across different orientations, effectively multiplying the data utility by enabling the model to learn from a single perspective and apply it to various rotations [11][19]. Group 2: Inference Efficiency - EfficientFlow demonstrates remarkable inference efficiency, achieving near-equivalent performance to existing SOTA methods with significantly fewer data and iterations. For instance, it reaches close to the performance of EquiDiff with 100 iterations in just 1 step, resulting in a 56-fold increase in single-step inference speed and nearly 20 times faster for 5-step inference [19]. - The model incorporates a time consistency strategy to ensure coherent action sequences during execution, utilizing overlapping predictions to maintain continuity in behavior [15][19]. - Periodic resets are implemented to enhance the model's ability to explore diverse behaviors while maintaining time consistency, ensuring minimal additional overhead during inference [17][19].
阿里妈妈发布MUSE:用多模态搞定十万级超长行为序列,并开源Taobao-MM数据集
机器之心· 2025-12-16 04:11
机器之心发布 如果把用户在互联网上留下的每一个足迹都看作一段记忆,那么现在的推荐系统大多患有 "短期健忘症"。 受限于算力和存储,那些沉睡在数年前的点击、收藏与购买,往往被粗暴地截断或遗忘。即便被召回,它们在模型眼中也只是一串串冰冷且互不相识的 ID 代码。 但事实上,真正有趣的东西也往往藏在这些被遗忘的 "长尾" 之中。如何唤醒这 10 万级 的沉睡数据,并读懂它们背后的视觉与语义关联? 阿里妈妈与武汉大学团队给出的答案是 MUSE(MUltimodal SEarch-based framework) 。这不仅仅是一个新的 CTR 模型,更像是一个给推荐系统安装的 "多模 态海马体"。它利用图像与文本的语义力量,重构了用户跨越时空的兴趣图谱。 甚至,他们还开源了构建这个 "数字大脑" 的基石: Taobao-MM 数据集 。 对于推荐系统长久以来技术演进路线,这一突破可谓是一次深刻的反思与重构! 论文标题:MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling 在搜推 ...