机器之心

Search documents
500万视频数据集+全新评测框架!北大开源主体一致性视频生成领域新基建OpenS2V-Nexus,生成视频 「像」 又 「自然」
机器之心· 2025-07-08 09:41
然而,要训练和评价这样的模型,过去一直缺少公开可用的 大规模数据集和细粒度评测基准 ,限制了 S2V 技术的快速突破。 想让 AI 能 「看着你的自拍就生成一致且自然的短视频 」 吗?这就是 Subject-to-Video(S2V)生成 要解决 的问题:让视频生成不仅对齐文本,还能准确保留指定人物或物体的特征,让生成的视频既 「像 」 又 「自然 」 。这一能力对于短视频生成、虚拟人、AI 剪辑等都有巨大意义。 为此, 北大团队推出了全新的开源套件 OpenS2V-Nexus ,专为 S2V 生成打造: OpenS2V-Eval: 全球首个面向主体一致性、自然度和文本对齐的 S2V 细粒度评测基准,让不同模型在 主体一致性上真正可比。 OpenS2V-5M: 全球首个公开的 500 万条高质量 720P 人物文本视频三元组数据集 ,覆盖真实和合成数 据,帮助研究者快速训练更强大的生成模型。 北大团队还在 18 个代表性 S2V 模型上进行了系统评测,首次揭示了目前主流模型在保持主体一致性和自然 度方面的真实能力差距。 通过 OpenS2V-Nexus,未来做 AI 视频生成不再盲人摸象,让训练更高效、评测更 ...
还在为AI数据发愁?张文涛和鄂维南院士团队推出Data-centric AI系统
机器之心· 2025-07-08 09:41
近年来,大模型发展主要由大型科技公司主导,其领先的核心在于规模庞大且高质量的数据资源。然而,这些公司通常并不公开其原始数据及数据处理工具,使 得学术界在大模型训练数据的构建与优化方面难以追赶,受制甚深。 尽管近年来开源了大量数据集,学术界在大模型数据准备方面仍面临诸多挑战。目前,大模型训练数据的清洗与构建仍主要依赖各个研究团队 "闭门造车",缺乏 系统化、高效的工具支持 。现有的数据处理工具如 Hadoop 和 Spark 等, 支持的操作算子大多偏向传统方法,尚未有效集成基于最新大语言模型(LLMs)的智能 算子,对于构建先进大模型的训练数据支持有限。 为此,张文涛和鄂维南院士团队提出了以数据为中心的 AI 系统 DataFlow 。它系统实现了 100 余个基于规则、本地大模型或大模型 API 的数据治理算子 (Operators),并在此基础上构建 8 条预设数据处理流水线(Pipeline),包括:大规模嘈杂数据(如 PDF 文档、纯文本、低质量问答数据、爬虫数据等)的清 洗、扩增与评估;带有思维链的强推理数据合成;RAG 数据提取与合成等等主流数据治理需求。该系统可供用户灵活组织现有算子,开发新算子 ...
KAG-Thinker:「结构化」思考新范式,支持逻辑严谨的大模型复杂推理
机器之心· 2025-07-08 06:54
Core Viewpoint - The article discusses the release of the KAG-Thinker model by Ant Group's Knowledge Engine team in collaboration with Zhejiang University and Tongji University, focusing on structured reasoning for complex tasks, enhancing logical consistency and stability in reasoning processes. Group 1: Model Development and Features - KAG-Thinker is an important upgrade of the KAG framework, designed to construct a stable and interpretable reasoning paradigm for complex tasks in both general and specialized fields [1][3] - The model utilizes a dual semantic representation mechanism of natural language and logical functions to better leverage structured knowledge [3] - It combines breadth splitting and depth solving to improve the rigor of problem-solving, introducing a knowledge boundary determination mechanism centered on knowledge point alignment [3][10] Group 2: Performance and Evaluation - Experimental results show that KAG-Thinker outperforms state-of-the-art deep search methods by an average of 4.1% across seven single-hop and multi-hop reasoning datasets [6][24] - In single-hop datasets, KAG-Thinker achieved an average improvement of 4.5%, while in multi-hop datasets, the improvement was 3.9% [25] - The model demonstrated effectiveness in specialized fields, particularly in medical question-answering tasks, indicating its potential for fine-tuning in other professional domains [6][39] Group 3: Framework Integration and Stability - The KAG framework version 0.8 enhances knowledge base capabilities, supporting structured and unstructured data integration, and allows developers to customize indexing [28][29] - KAG-Thinker, integrated with the KAG framework, shows an average performance improvement of 3.0% in EM and 3.8% in F1 metrics compared to the standalone Thinker model [31] - Stability tests indicate that KAG-Thinker 7B outperforms previous versions in terms of consistent problem decomposition, achieving an average improvement of 17.9% and 7.6% under common temperature parameters [33]
用隐藏指令诱导AI给论文打高分,谢赛宁合著论文被点名:认错,绝不鼓励
机器之心· 2025-07-08 06:54
Core Viewpoint - The article discusses the ethical implications of embedding prompts in academic papers to influence AI reviews, highlighting a recent incident involving a professor and the need for a reevaluation of academic integrity in the AI era [2][4][15]. Group 1: Incident Overview - A recent investigation revealed that at least 14 top universities had research papers containing hidden prompts instructing AI to give positive reviews [3]. - The incident involved a paper co-authored by NYU assistant professor谢赛宁, which was found to contain such a prompt, leading to significant scrutiny [4][6]. Group 2: Professor's Response - Professor谢赛宁 acknowledged his responsibility as a co-author and group leader for not thoroughly reviewing all submission documents [10][11]. - He clarified that a visiting student misunderstood a joke about embedding prompts and applied it to a submitted paper, not realizing the ethical implications [12]. Group 3: Ethical Discussion -谢赛宁 emphasized the need for a deeper discussion on research ethics in the age of AI, advocating for constructive dialogue rather than personal attacks [15][24]. - The incident raised questions about the current academic system's handling of AI in peer review, with some arguing that embedding prompts could be seen as a form of self-protection against AI reviews [20][26]. Group 4: Broader Implications - The article points out that the increase in AI-generated papers has led to a shortage of reviewers, pushing some to rely on AI for evaluations, which could compromise review quality [30]. -谢赛宁's case serves as a catalyst for further discussions on establishing reasonable constraints to improve the peer review environment [31].
V·STAR顶尖人才计划启动|不只是顶薪+期权,更与VAST一起定义下一代3D范式
机器之心· 2025-07-08 04:09
Core Viewpoint - VAST aims to redefine creativity boundaries through the development of general-purpose 3D large models and tools for 3D content creation, establishing a 3D UGC content platform that enhances user experience and productivity [6]. Recruitment Programs - The company is targeting researchers graduating by December 31, 2026, for its campus recruitment program, offering competitive salaries that exceed top-tier companies along with early-stage equity options [3]. - An internship program is available for researchers graduating in 2027 and beyond, with daily compensation ranging from 1000 to 2000 yuan, with no upper limit [4]. Achievements and Highlights - VAST has launched the world's first one-stop AI 3D workstation, Tripo Studio, generating over $500,000 in monthly revenue and attracting more than 35,000 active users [7]. - The company has developed a state-of-the-art (SOTA) 3D foundational model matrix, contributing to over 30 top conference papers and 18 open-source projects, with more than 20,000 stars on GitHub [8]. Mission and Vision - The mission includes exploring and creating advanced algorithms in 3D generation models, addressing challenges in high-fidelity geometric details, editability, dynamic generation, and large-scale scene interactions [14]. - The company emphasizes the importance of translating research outcomes into core products and open-source communities, promoting real-world applications of technology [15]. - VAST encourages team collaboration to shape the technology roadmap, aiming to lead the industry in technological advancements [16]. Target Talent - The company seeks individuals with a background in computer science or AI, who have published innovative research as first authors in top conferences or journals [18]. - Candidates with deep expertise in computer vision, graphics, generative models, and proficiency in Python/PyTorch are highly valued, especially those with notable GitHub projects or technical blogs [19]. - VAST looks for future creators who believe in the power of technology, possess excellent judgment, and are driven by curiosity and long-term vision to challenge the status quo and redefine future technology directions [20].
Transformer死角,只需500步后训练,循环模型突破256k长度泛化极限
机器之心· 2025-07-08 04:09
Core Insights - The article discusses the advantages of linear recurrent models, such as Mamba, and linear attention mechanisms in handling long sequences, which is crucial for long-context reasoning tasks [1][2] - It highlights the performance improvements of recurrent models over time, indicating that they can now compete with Transformers in various tasks, despite previous limitations [3] - A significant finding is that recurrent models struggle with generalization beyond training lengths, leading to performance drops when faced with longer sequences [4][6] Group 1 - The article presents a solution to the generalization issue in recurrent models through simple training interventions, allowing them to generalize to sequences up to 256k in length with just 500 additional training steps [7] - The research emphasizes that recurrent models possess untapped potential rather than inherent flaws [7][8] - The authors propose the "Unexplored States Hypothesis" to explain why recurrent models fail to generalize in length, indicating that they only learn from a limited subset of possible states during training [13][14] Group 2 - The article outlines four training interventions to improve length generalization by altering the initial state of the model [19] - These interventions include Random Noise, Fitted Noise, State Passing, and Truncated Backpropagation Through Time (TBTT), each designed to expose the model to a broader range of state distributions [20][19] - The findings reveal that State Passing and TBTT mechanisms effectively enable length generalization, achieving results with only 0.02% of the original pre-training budget [23][24] Group 3 - The article discusses the performance of these interventions in various long-context tasks, demonstrating their ability to enhance length generalization [31] - Specific tasks mentioned include the BABILong benchmark, password retrieval, and synthetic copying tasks, where the interventions significantly improved model performance [32][35][39] - The results indicate that models trained with these interventions can effectively utilize relationships between tokens beyond the training context length [36][39] Group 4 - The article introduces the concept of "Effective Remembrance" to measure how well a model retains information from previous tokens, aiming for models to focus on recent context rather than distant tokens [44][50] - It shows that State Passing improves effective memory, allowing models to prioritize recent tokens in their predictions [51][52] - This adjustment is crucial for text modeling, ensuring that earlier tokens do not disproportionately influence the model's output [52]
ICML 2025 | 清华、上海AI Lab提出专家级医学基准MedXpertQA,看o3、R1哪家强
机器之心· 2025-07-08 04:09
本文作者来自于清华大学和上海 AI Lab,通讯作者为清华大学丁宁助理教授和清华大学讲席教授、上海 AI Lab 主任周伯文教授。 论文已被 ICML 2025 接收,并且被 DeepMind MedGemma 采用为评估基准 。 | Metric | MedGemma 27B | Gemma 3 27B | MedGemma 4B | Gemma 3 4B | | --- | --- | --- | --- | --- | | MedQA (4-op) | 89.8 (best-of-5) 87.7 (0-shot) | 74.9 | 64.4 | 50.7 | | MedMCQA | 74.2 | 62.6 | 55.7 | 45.4 | | PubMedQA | 76.8 | 73.4 | 73.4 | 68.4 | | MMLU Med (text only) | 87.0 | 83.3 | 70.0 | 67.2 | | MedXpertQA (text only) | 26.7 | 15.7 | 14.2 | 11.6 | | AfriMed-QA | 84.0 | 72.0 | 52.0 | 4 ...
RL 圈的夏夜之约!12 人唠嗑局:当强化学习撞上大模型 Agent
机器之心· 2025-07-08 04:09
Core Viewpoint - The article promotes an event titled "Reinforcement Learning New Paradigm Exploration Night," emphasizing the integration of reinforcement learning (RL) with large model agents, highlighting its significance in the current technological landscape [2][3]. Event Details - The event is scheduled for July 26, 2025, from 19:00 to 21:10, located near the Shanghai Expo Exhibition Center, aiming for an intimate gathering of only 12 participants to facilitate deep discussions [3][4]. - The event will cover three main topics: the synergy between reinforcement learning and large model agents, the dilemma of exploration versus stability in training strategies, and the challenges of aligning and evaluating intelligent agents [4]. Target Audience - The event is designed for individuals from academia, industry, and entrepreneurship, encouraging participants to bring their latest research, practical experiences, and product challenges for collaborative discussions [5][6]. - The focus is on fostering an environment for lively exchanges of ideas rather than formal presentations, aiming for a dynamic and engaging atmosphere [6][7]. Participation Information - Interested participants are encouraged to scan a QR code to express their identity (academic, industry, or entrepreneurial) and the specific RL challenges they wish to discuss, with limited spots available [8]. - The article emphasizes the importance of engaging in meaningful technical discussions and debates, suggesting that the event will provide a unique opportunity for networking and collaboration [9].
上交研究登Nature大子刊!可微分物理首次突破端到端无人机高速避障
机器之心· 2025-07-08 00:04
本文主要作者来自上海交通大学和苏黎世大学,第一作者张宇昂,上海交通大学研究生,主要研究方向包括可微分物理机器人、多目标追踪和AIGC;共同 一作胡瑜,上海交通大学博士生,主要研究方向为无人机视觉导航;共同一作宋运龙博士来自苏黎世大学,主要研究方向是强化学习、最优控制。通讯作 者为上海交通大学的林巍峣教授和邹丹平教授。 想象一下:在未知森林、城市废墟甚至障碍密布的室内空间,一群无人机像飞鸟般快速穿梭,不依赖地图、不靠通信、也无需昂贵设备。这一设想,如今成为现 实! 上海交通大学研究团队提出了一种融合无人机物理建模与深度学习的端到端方法,该研究首次将可微分物理训练的策略成功部署到现实机器人中,实现了无人机 集群自主导航,并在鲁棒性、机动性上大幅领先现有的方案。 该成果已于《 Nature Machine Intelligence 》在线发表。其中张宇昂硕士、胡瑜、宋运龙博士为共同第一作者,邹丹平与林巍峣教授为通信作者。 | | | 论文地址: https://www.nature.com/articles/s42256-025-01048-0 核心理念:大道至简 过去的无人机自主导航往往依赖: 高复杂度定位与建图 ...
刚刚,苹果基础模型团队负责人庞若鸣被Meta挖走!加入超级智能团队、年薪千万美元
机器之心· 2025-07-08 00:04
机器之心报道 机器之心编辑部 Meta 的挖人仍在继续,这次瞄向了苹果。 今日,据彭博社最新消息,苹果基础模型团队负责人、杰出工程师庞若鸣(Ruoming Pang)即将离职并加入 Meta。 2021 年从谷歌跳槽到苹果的庞若鸣,将成为 Meta 新成立的超级智能团队的最新重磅成员。 据知情人士透露, 为了招揽庞若鸣,Meta 提供了每年价值数千万美元的薪酬方案 。这段时间,Meta 首席执行官马克・扎克伯格一 直在招兵买马,高薪引进了多位顶尖 AI 领导者,包括 Scale AI 的 Alexandr Wang、初创公司创始人 Daniel Gros 以及前 GitHub 首席 执行官 Nat Friedman。 据其他知情人士称,Meta 在周一还招募了 OpenAI 研究员 Yuanzhi Li 以及在 Anthropic PBC 公司从事 Claude 开发的 Anton Bakhtin。 目前,对于这些人事变动,Meta、苹果、庞若鸣、OpenAI 和 Anthropic 均未回应彭博社的置评请求。 庞若鸣及其基础模型团队 据领英公开资料,庞若鸣本科毕业于上海交通大学。他在谷歌工作了 15 年,此 ...