Workflow
机器之心
icon
Search documents
应届生看过来!上海AI Lab校招通道已开,100+岗位,700+offer,让科研理想照进现实!
机器之心· 2025-08-21 04:12
Group 1 - The article announces the launch of the 2026 global campus recruitment for the Shanghai Artificial Intelligence Laboratory, offering over 100 positions [1] - The laboratory seeks individuals who are not only skilled in algorithms but also excel in complex engineering and are eager to validate technology in real-world scenarios [3] - Candidates are encouraged to pursue challenging and innovative research, focusing on fundamental issues rather than settling for easy achievements [3] Group 2 - The recruitment is targeted at graduates from January 2025 to October 2026, with specific categories for "Dream New Stars," "Academic New Stars," "Engineering New Stars," and "Competition New Stars" [4] - There are six categories of positions available, including algorithm, research and development, product, operations, solutions, and functional/support roles [6][7] - The application process includes online submissions starting from August 20, 2025, followed by a series of written tests and interviews [10][11] Group 3 - The laboratory provides a top-tier research platform with extensive computational resources and data support, encouraging candidates to engage in scalable and impactful projects [12][13] - Candidates can apply by scanning a QR code or contacting the provided assistant for any issues during the application process [14]
击败Meta登榜首:推理增强的文档排序模型ReasonRank来了
机器之心· 2025-08-21 04:12
本文的第一作者是刘文涵,就读于中国人民大学高瓴人工智能学院,博士三年级,导师为窦志成教授,目前在百度大搜部门进行实习。他的研究方向聚焦于 AI 搜 索,在顶级国际会议如 ACL、WWW 等发表了多篇论文。 推理大模型(Large Reasoning Model)极大的促进了自然语言处理领域的发展,而信息检索领域的核心问题之一是文档排序,如何利用强大的推理大模型通过主动 推理来判断文档的相关性,进而再对文档进行排序是一个值得探索的方向。 在本次工作中,我们提出了 ReasonRank,ReasonRank 在包括 BRIGHT、R2MED 在内的多个榜单,击败了 UMASS 大学,Waterloo 大学,Meta 在内的多个大学和 机构, 于 2025 年 8 月 9 日荣登榜单第一名。我们更小尺寸的 ReasonRank-7B 也远远超越了其他 32B 大小的推理型排序大模型,同时相比 pointwise 排序器具备明 显的效率优势。此外,我们的论文还获得了 Huggingface paper 日榜第一名。 | Rank | Retriever | Score | | --- | --- | --- | | ...
上下文记忆力媲美Genie3,且问世更早:港大和可灵提出场景一致的交互式视频世界模型
机器之心· 2025-08-21 01:03
Core Insights - The article discusses the development of video generation models that can maintain scene consistency over long durations, addressing the critical issue of stable scene memory in interactive long video generation [2][10][17] - Google DeepMind's Genie 3 is highlighted as a significant advancement in this field, demonstrating strong scene consistency, although technical details remain undisclosed [2][10] - The Context as Memory paper from a research team at Hong Kong University and Kuaishou is presented as a leading academic work that closely aligns with Genie 3's principles, emphasizing implicit learning of 3D priors from video data without explicit 3D modeling [2][10][17] Context as Memory Methodology - The Context as Memory approach utilizes historical generated context as memory, enabling scene-consistent long video generation without the need for explicit 3D modeling [10][17] - A Memory Retrieval mechanism is introduced to efficiently utilize theoretically infinite historical frame sequences by selecting relevant frames based on camera trajectory and field of view (FOV), significantly improving computational efficiency and reducing training costs [3][10][12] Experimental Results - Experimental comparisons show that Context as Memory outperforms existing state-of-the-art methods in maintaining scene memory during long video generation [15][17] - The model demonstrates superior performance in static scene memory retention over time and exhibits good generalization across different scenes [6][15] Broader Research Context - The research team has accumulated multiple studies in the realm of world models and interactive video generation, proposing a framework that outlines five foundational capabilities: Generation, Control, Memory, Dynamics, and Intelligence [18] - This framework serves as a guiding direction for future research in foundational world models, with Context as Memory being a focused contribution on memory capabilities [18]
刚刚,字节开源Seed-OSS-36B模型,512k上下文
机器之心· 2025-08-21 01:03
| 机器之心报道 | 机器之心编辑部 | | --- | --- | | 开源赛道也是热闹了起来。 | | | 就在深夜,字节跳动 Seed 团队正式发布并开源了 Seed-OSS 系列模型,包含三个版本: | | | Seed-OSS-36B-Base(含合成数据) | | | Seed-OSS-36B-Base(不含合成数据) | | | Seed-OSS-36B-Instruct(指令微调版) | | Seed-OSS 使用了 12 万亿(12T)tokens 进行训练,并在多个主流开源基准测试中取得了出色的表现。 这三个模型均以 Apache-2.0 许可证发布,允许研究人员和企业开发者自由使用、修改和再分发。 主要特性: 模型架构 Seed-OSS-36B 的架构结合了多种常见的设计选择,包括因果语言建模、分组查询注意力(Grouped Query Attention)、SwiGLU 激活函数、RMSNorm 和 RoPE 位置 编码。 每个模型包含 360 亿参数,分布在 64 层网络中,并支持 15.5 万词表。 其最具代表性的特性之一是原生长上下文能力,最大上下文长度可达 512k token ...
报名开启|中关村国际青年论坛:诚邀全球青年学者共探AI前沿
机器之心· 2025-08-20 09:47
Core Viewpoint - Beijing Zhongguancun Academy is a new higher education and research institution focusing on artificial intelligence and interdisciplinary fields, emphasizing disruptive research and practical talent cultivation [2][3]. Group 1: Institutional Overview - Beijing Zhongguancun Academy specializes in education and research innovation in artificial intelligence and interdisciplinary fields, promoting scientific exploration through research projects [3]. - The Zhongguancun Artificial Intelligence Research Institute is a young exploratory R&D institution aimed at future-oriented scientific exploration with industrial value [3]. Group 2: International Forum - The "Zhongguancun International Youth Forum," organized by Beijing Zhongguancun Academy and supported by the Zhongguancun Artificial Intelligence Research Institute, invites global young talents in AI and interdisciplinary fields [5]. - Since its establishment in September 2024, the forum has successfully held two sessions, attracting 98 top young scholars from seven countries, covering topics in AI, biotechnology, and interdisciplinary integration [5]. Group 3: Forum Details - The upcoming forum will take place on September 18-19, 2025, at the Beijing Zhongguancun Academy C5 Research Building [6]. - Key agenda items include invited reports from top scholars, oral presentations for young scholars, roundtable discussions on "AI for Science," and a poster session for knowledge exchange [6][9]. Group 4: Talent Development and Support - The academy offers a comprehensive talent introduction policy, including support for project applications, housing subsidies, and education for children to ensure talent development [6]. - The institution collaborates with 31 top universities and leading enterprises to implement a project-based education model [14]. Group 5: Participation Requirements - Candidates must hold a PhD in AI or related interdisciplinary fields with at least two years of experience and a record of publications in top conferences or journals [15]. - The application deadline is August 27, 2025, and interested scholars should submit their materials via email [15].
Sora没做到的,LongVie框架给解决了,超长视频生成SOTA
机器之心· 2025-08-20 09:47
从 Sora 的惊艳亮相到多款高性能开源模型的诞生,视频生成在过去两年迎来爆发式进步,已能生成几十秒的高质量短片。然而,要想生成时长超过 1 分钟、内容 与运动可控、风格统一的超长视频,仍面临巨大挑战。 为此,上海人工智能实验室联合南京大学、复旦大学、南洋理工大学 S-Lab、英伟达等机构提出 LongVie 框架,系统性解决可控长视频生成中的核心难题。 难点剖析 直接利用当前的可控视频生成模型生成分钟级长视频时,通常会出现以下问题: 项目主页:https://vchitect.github.io/LongVie-project/ 视频:https://www.youtube.com/watch?v=SOiTfdGmGEY&t=1s 论文:https://arxiv.org/abs/2508.03694 Github:https://github.com/Vchitect/LongVie 时序不一致:前后画面细节与内容不连贯,出现闪烁等现象。 视觉退化:随时长增长,出现颜色漂移、清晰度下降等问题。 解决时序不一致:两项关键策略 LongVie 从「控制信号」与「初始噪声」两条路径入手: 1. 控制信号全局归一 ...
dLLM的「Free Lunch」!浙大&蚂蚁利用中间结果显著提升扩散语言模型
机器之心· 2025-08-20 04:26
本文第一作者王文,浙江大学博士生,研究方向是多模态理解与生成等。 本文通讯作者沈春华,浙江大学 求是讲席教授,主要研究课题包括具身智能、大模型推理增强、强化学习、通用感知模型等。 近年来,扩散大语言模型(Diffusion Large Language Models, dLLMs)正迅速崭露头角,成为文本生成领域 的一股新势力。与传统自回归(Autoregressive, AR)模型从左到右逐字生成不同,dLLM 依托迭代去噪的生 成机制,不仅能够一次性生成多个 token,还能在对话、推理、创作等任务中展现出独特的优势。当你还在 等传统 LLM「一个字一个字」地憋出答案时,dLLM 早已通过几轮迭代「秒」出完整结果,带来前所未有 的生成效率。 然而,速度的提升并不意味着完美的答案。现有 dLLM 的解码策略往往只关注最后一次迭代的生成结果, 直接舍弃了中间多轮迭代中蕴含的丰富语义与推理信息。这些被忽视的中间预测,实际上可能暗藏着更准 确、更接近真相的答案。一旦被丢弃,不仅造成信息浪费,还可能让模型错失做对题目的最佳时机。 更令人意外的是,研究团队在数学推理任务中观察到了一种「先对后错」的现象:模型先是得出了 ...
DiT在数学和形式上是错的?谢赛宁回应:不要在脑子里做科学
机器之心· 2025-08-20 04:26
Core Viewpoint - The article discusses criticisms of the DiT model, highlighting potential architectural flaws and the introduction of a new method called TREAD that significantly improves training efficiency and image generation quality compared to DiT [1][4][6]. Group 1 - A recent post on X claims that DiT has architectural defects, sparking significant discussion [1]. - The TREAD method achieves a training speed improvement of 14/37 times on the FID metric when applied to the DiT backbone network, indicating better generation quality [2][6]. - The post argues that DiT's FID stabilizes too early during training, suggesting it may have "latent architectural defects" that prevent further learning from data [4]. Group 2 - TREAD employs a "token routing" mechanism to enhance training efficiency without altering the model architecture, using a partial token set to save information and reduce computational costs [6]. - The author of the original DiT paper, Sseining, acknowledges the criticisms and emphasizes the importance of experimental validation over theoretical assertions [28][33]. - Sseining also points out that DiT's architecture has some inherent flaws, particularly in its use of post-layer normalization, which is known to be unstable for tasks with significant numerical range variations [13][36]. Group 3 - The article mentions that DiT's design relies on a simple MLP network for processing critical conditional data, which limits its expressive power [16]. - Sseining highlights that the real issue with DiT lies in its sd-vae component, which is inefficient and has been overlooked for a long time [36]. - The ongoing debate around DiT reflects the iterative nature of algorithmic progress, where existing models are continuously questioned and improved [38].
论坛报名已启动,速来锁定席位!解码具身智能的落地挑战与产业爆点
机器之心· 2025-08-20 04:26
Core Insights - The article emphasizes that embodied intelligence is becoming the core battlefield of the next technological competition, representing a significant step in integrating digital intelligence into the physical world [2][5] - It highlights the rapid advancements in this field over recent months, showcasing various milestones such as robots performing at events and challenges, while questioning the distance to true cross-scenario implementation [2][5] - The article discusses the need to overcome core bottlenecks, particularly generalization capabilities, to enable robots to operate effectively in dynamic environments and create sustainable commercial value [2][5] Event Overview - The 2025 Inclusion·Bund Conference will take place from September 10 to 13, 2025, in Shanghai, focusing on embodied intelligence [3] - A forum titled "Embodied Intelligence: From Generalization to Action, Reshaping the Future of Industries" will be held on September 11, featuring various discussions and presentations from industry leaders and experts [3][4] Forum Agenda - The forum will include keynote speeches, thematic presentations, and roundtable discussions, addressing the technological innovations needed for robots to achieve true generalization and actionable capabilities [5][9] - Notable speakers include experts from Tsinghua University, NVIDIA, and various robotics companies, discussing topics such as efficient data simulation, the next steps for embodied intelligence, and commercialization pathways [8][9][12][13][15][19]
ICCV 2025 | 跨越视觉与语言边界,打开人机交互感知的新篇章:北大团队提出INP-CC模型重塑开放词汇HOI检测
机器之心· 2025-08-20 00:15
Core Viewpoint - The article discusses a novel open-vocabulary human-object interaction (HOI) detection method called Interaction-aware Prompt and Concept Calibration (INP-CC), which enhances the understanding of interactions in open-world scenarios by dynamically generating interaction-aware prompts and optimizing concept calibration [2][4][5]. Summary by Sections Introduction to HOI Detection - Current HOI detection methods are limited to closed environments and struggle to identify new interaction types, which restricts their practical applications [6]. - The rise of multimodal large models presents significant potential for application in open environments, making the study of their use in HOI detection a focal point [6]. Innovations of INP-CC - INP-CC introduces two core innovations: Interaction-aware Prompt Generation and Concept Calibration, which help the model better understand complex interaction semantics [7][16]. - The model employs a mechanism that allows for selective sharing of prompts among similar interactions, enhancing learning efficiency [7]. Model Architecture - INP-CC utilizes an interaction-adaptive prompt generator to dynamically construct relevant prompts based on the input image characteristics, improving the model's focus on key interaction areas [14]. - The model generates detailed visual descriptions of interactions and clusters them into a fine-grained conceptual structure, aiding in the understanding of complex interactions [14][20]. Experimental Performance - INP-CC outperforms existing methods on the HICO-DET and SWIG-HOI datasets, achieving a mean Average Precision (mAP) of 16.74% on the SWIG-HOI full test set, which is nearly a 10% improvement over the previous method CMD-SE [18][22]. - The model demonstrates strong attention capabilities, effectively focusing on critical interaction areas, as evidenced by visual analysis [23]. Conclusion - INP-CC breaks through the limitations of pre-trained visual language models in regional perception and concept understanding, showcasing the potential of integrating language model knowledge into computer vision tasks [25].