机器之心
Search documents
AI视频生成走向「演技生成」时代,生数科技Vidu全球发布Vidu Q2
机器之心· 2025-09-25 05:24
机器之心发布 机器之心编辑部 当 AI 视频不再只像过去那样比拼高清像素,而是开始进入 "飙演技" 阶段,AI 视频才算正式迈入内容生产的最高级形式 —— 影视级叙事新阶段。 9 月 25 日,生数科技新一代图生视频大模型 Vidu Q2 正式全球上线,打破了原有 AI 生成的表情太假,动作飘忽不定,运动幅度不够大,无法指哪打哪的 行业问题, 实现从 "视频生成" 到 "演技生成",从 "动态流畅" 到 "情感表达" 的革命性跨越,标志着 AI 视频生成技术正式从追求 "形似" 进入追求 "神 似" 的新纪元 ,将为内容创作、影视产业、广告营销等领域带来全新升级。Vidu Q2 图生视频功能不仅能胜任复杂表情变化的文戏,常见的多人打斗场景 的武戏,而且还能完美呈现大片中的炫酷特效。 据了解,相比于今年上半年发布的 Vidu Q1 模型,此次发布的 Vidu Q2 图生视频功能在极致细微表情生成、推拉运镜、语义理解、生成速度与时长选择方 面都有了大幅提升,主要有 4 大亮点: 此外,为了满足用户对于生成速度和生成质量的不同需求,Vidu Q2 图生视频分为闪电模式和电影大片模式。闪电模型下 20 秒就能生成 5 ...
首个代码世界模型引爆AI圈,能让智能体学会「真推理」,Meta开源
机器之心· 2025-09-25 03:20
Core Insights - The article discusses the introduction of the Code World Model (CWM) by Meta, which is a significant advancement in AI for code generation and reasoning [1][2][4]. Group 1: Model Overview - CWM is a 32 billion parameter open-weight large language model (LLM) designed to enhance code generation through world modeling [7]. - It supports a maximum context length of 131k tokens and is structured as a dense, decoder-only LLM [8]. - The model has shown strong performance in general programming and mathematical tasks, achieving a pass rate of 96.6% on Math-500 and 76.0% on AIME 2024 [6]. Group 2: Training and Methodology - To improve code understanding, the Meta FAIR CodeGen team utilized extensive observation-action trajectories in a Python interpreter and agent-based Docker environment for mid-training [12]. - CWM was trained on a large dataset of coding data and customized Python + Bash world modeling data, enabling it to simulate Python function execution and agent interactions in Bash [22]. Group 3: Performance Metrics - CWM achieved notable performance in various benchmarks, including a pass rate of 35.1% in the Aider Polyglot benchmark and 65.8% in SWE-bench Verified with test-time extension [23][26]. - In comparison to other models, CWM demonstrated competitive results, particularly in time and space complexity predictions, outperforming baseline models in all metrics [29]. Group 4: Future Research Directions - Meta envisions CWM bridging the gap between language-level reasoning and executable semantics, with potential applications in zero-shot planning and reinforcement learning [30]. - The model's ability to predict the consequences of its actions is expected to enhance efficiency in interactions with environments, allowing for more complex task handling [30].
具身智能从此「边听边说」,智源研究院开源原生全双工语音大模型RoboBrain-Audio
机器之心· 2025-09-25 03:20
论文链接:https://arxiv.org/abs/2509.02521 Hugging Face 模型页:https://huggingface.co/CofeAI/FLM-Audio 南洋理工大学,正式发布 RoboBrain-Audio(FLM-Audio) —— 首个支持 "自然独白 + 双训练范式" 的原生全双工语音对话大模型。 在一段自然对话音频中,用户连续提出多个不同问题,并多次在模型回答过程中打断。 RoboBrain-Audio 始终能够迅速停顿当前输出、准确理解新的问题并即时作答,展现出真实交流中所需的全双工、强 鲁棒性与高自然度。 RoboBrain-Audio 采用原生全双工 (Native Full-duplex) 架构,相比传统的 TDM(时分复用)模型在响应延迟、对话自然度上实现飞跃式提升,同时语言理解能力显 著强于其他原生全双工模型,标志着 具身智能体从 "能听会说" 向 "边听边说" 的交互能力跃迁。 根据公开数据,当前业界训练音频基座模型时使用的数据量已达到上千万乃至上亿小时,这些模型在音色克隆和长回复生成上更具优势,而 RoboBrain-Audio 仅使 用 100 ...
对抗协作+原型学习!深北莫FedPall开源,联邦学习破局特征漂移,准确率登顶SOTA
机器之心· 2025-09-24 09:25
Core Viewpoint - The article discusses the FedPall algorithm, which addresses the feature drift problem in federated learning by combining prototype-based adversarial and collaborative learning techniques, achieving state-of-the-art performance across various datasets [2][10]. Methodology - The FedPall framework introduces an adversarial learning mechanism between clients and the server, enhancing feature representation alignment in a unified feature space through collaborative learning [3]. - A hierarchical integration strategy is developed to combine global prototypes with local features, facilitating client-server collaboration [5]. - The server trains a shared global amplifier and utilizes KL divergence to enhance heterogeneous information from different clients, mapping raw data to a unified feature space [5]. - The global classifier is distributed to each client, replacing the local classifiers to improve generalization and mitigate feature drift [6]. Performance Evaluation - FedPall was evaluated on three publicly available feature drift datasets: Digits, Office-10, and PACS, demonstrating superior accuracy compared to classical methods and state-of-the-art baselines [8][10]. - In the Office-10 dataset, FedPall achieved an overall accuracy approximately 3 percentage points higher than the second-best method, ADCOL [10]. - The Digits dataset results showed FedPall outperforming all other models, with an accuracy exceeding the second-best model, FedBN, by about 1.1 percentage points [10]. - FedPall consistently maintained higher average accuracy across all three datasets compared to ADCOL, with improvements ranging from 1.1 to 3 percentage points [12]. Future Directions - The research aims to validate the FedPall framework's generalization capabilities across other data modalities and task types in future studies [13].
大模型七连发,外国人馋透了!阿里云栖大会全栈升级够狠
机器之心· 2025-09-24 09:23
Core Insights - Alibaba has made significant advancements in AI technology, unveiling a comprehensive suite of new models at the 2025 Yunqi Conference, showcasing breakthroughs across various modalities [2][4][41] - The flagship model, Qwen3-Max, has surpassed competitors like GPT-5 and Claude Opus 4, achieving a total parameter count exceeding 1 trillion and demonstrating enhanced capabilities in understanding and coding [6][8][41] - The company aims to invest over 380 billion yuan in cloud and AI hardware infrastructure over the next three years, indicating a strong commitment to advancing AI capabilities [43][45] Model Developments - The Qwen3-Max model has two versions: Instruct and Thinking, with significant improvements in Chinese and English comprehension, complex instruction adherence, and programming abilities [8][10] - Qwen3-Next, the next-generation foundational model architecture, introduces innovations like mixed attention mechanisms and high sparsity MoE architecture, achieving a parameter count of 80 billion while maintaining high efficiency [12][14] - Qwen3-Coder, a specialized programming model, has been upgraded to enhance code generation and completion capabilities, with a notable increase in API call volume by 1474% [18][19] Multi-Modal Capabilities - Qwen3-VL, a powerful visual language model, has been released, excelling in visual understanding and reasoning tasks, outperforming competitors in key benchmarks [21][22][23] - The model can autonomously operate computer and mobile interfaces, recognizing GUI elements and generating executable code based on design sketches [23][25] - Qwen3-Omni, a comprehensive multi-modal model, has been launched, achieving state-of-the-art performance in 32 out of 36 audio-visual evaluation tasks [26][27] Future Directions - Alibaba's strategy includes maintaining an open-source approach for its models, positioning itself as a leader in the AI landscape, and aiming to replace modern operating systems with large models as the primary interface for user needs [45][47] - The company envisions the future of AI as moving beyond AGI towards superintelligent AI (ASI), indicating a transformative shift in the industry [43][45]
ICCV 25 Highlight | 扩散过程「早预警」实现6x加速,AIGC生图的高效后门防御
机器之心· 2025-09-24 09:23
本文的第一作者翟胜方和共同第一作者李嘉俊来自北京大学,研究方向为生成式模型安全与隐私。其他合作者分别来自新加坡国立大学、清华大学、浙江大学和 弗吉尼亚理工大学。 随着 AIGC 图像生成技术的流行,后门攻击给开源社区的繁荣带来严重威胁,然而传统分类模型的后门防御技术无法适配 AIGC 图像生成。 虽然针对传统模型(以分类模型为主)已有多种输入级后门防御方法的研究,即通过判断输入样本是否携带可疑触发器来阻止恶意样本进入模型。 这类防御方法主要依赖于一个假设:触发词的主导性(Trigger Dominance)。即一旦触发,模型输出几乎被完全控制,即便修改恶意输入的其他词汇或像素区 域,模型置信度仍基本不变。 针对这一问题,本文首先通过对神经元的分析定义了图像生成过程中的「早期激活差异」现象。 在此基础上,本文提出了一种高效的输入级后门防御框架( NaviT2I ),该框架基于神经元激活差异检测可疑样本,并通过对扩散过程的分析加速检测过程,进 一步满足实时检测的部署需求。 1. 研究背景 近来,基于扩散模型的图像生成技术蓬勃发展,用户可以利用文本描述生成具有真实感的图像。随着多个第三方机构陆续开源模型 [1, 2 ...
数智赋能:建筑地产行业的转型突围与未来筑造
机器之心· 2025-09-24 07:48
Core Insights - The construction and real estate industry is a cornerstone of human civilization and a key pillar of the global economy, demonstrating strong resilience amid changing times [1] - The ESG concept is driving green development as an industry consensus, while digital transformation is crucial for operational innovation and enhancing product competitiveness [1] Group 1: Industry Trends - The demand for high-quality living is a global consensus, leading to an upgrade in the need for "good houses, good communities, and good urban areas," which drives companies to focus on "product strength" as a core competitive advantage [4] - Companies that are keenly capturing this trend have initiated transformations, with Huawei emerging as a significant partner in the industry's transition through its understanding of "good products" and digital practices [4] Group 2: Digital Transformation - The core value of new productive forces lies in achieving efficiency and quality upgrades across the entire "investment, financing, construction, management, and operation" process through digital technologies [6] - AI empowerment is expected to evolve from tool assistance to intelligent decision-making across the entire industry chain, shifting the competitive focus to spatial and asset operation capabilities [6] Group 3: Technological Integration - In the design phase, large model technology is reshaping creativity and review logic, enhancing review efficiency and establishing a quality feedback loop through knowledge-driven design [6][8] - In operations, technology integration addresses management pain points, supporting the transformation of real estate investment and operation businesses into the AI era [8] Group 4: Future Outlook - Digital intelligence is not only a necessary path for the transformation of the construction and real estate sector but also a core support for achieving green, low-carbon, and high-quality development [10] - Huawei aims to continue deepening its engagement in the industry, using digital intelligence technologies and ecological collaboration to co-create a smarter and better living environment [10]
AI4S新势力齐聚「SAIS Talk上智院星辰之夜」:五大前沿分享,等你来听
机器之心· 2025-09-24 07:48
Core Insights - The article emphasizes the role of the younger generation in driving innovation in the field of artificial intelligence, particularly in scientific research [2] - The Shanghai Institute of Scientific Intelligence (上智院) is highlighted as the world's first research institute focused on AI for Science, aiming to transform scientific research paradigms and empower various industries [2] - The SAIS Talk event showcases promising young researchers sharing their innovative work in scientific intelligence, indicating a vibrant future for AI in scientific discovery [3] Group 1: Event Overview - The SAIS Talk has successfully held 15 sessions, featuring speakers from diverse backgrounds, including top scholars and active researchers, to foster inspiration and collaboration [3] - The event on September 26 will feature five young researchers discussing topics such as representation learning, catalytic reaction prediction, and global weather forecasting [3] Group 2: Research Highlights - Research on hierarchical spatiotemporal representation and cross-scale implicit autoregressive modeling significantly improves long-term prediction accuracy in dynamic systems [5] - The RXNGraphormer framework unifies the prediction of chemical reaction performance and synthesis planning, achieving leading performance across multiple prediction tasks [10] - A 4D diffusion model framework for protein dynamics and conformational generation offers new computational paradigms for understanding protein functions and accelerating drug design [13] - The SCRIPT framework for single-cell gene regulatory relationship prediction shows over twofold improvement in long-range regulatory predictions, with implications for complex disease genetic diagnostics [17] - FuXi-Weather, a machine learning-based global weather forecasting system, demonstrates superior performance in sparse observation areas compared to traditional numerical weather prediction systems [21]
机器人界的「Imagenet 时刻」,李飞飞团队官宣全球顶级具身智能挑战赛
机器之心· 2025-09-24 02:31
在计算机视觉的历史上,Imagenet 挑战赛曾被誉为 AI 发展的分水岭,引爆了深度学习的浪潮。那么,在具身智能与机器人领域,是否也会迎来类似的 "拐点时 刻"? 答案或许渐渐清晰。李飞飞团队与斯坦福 AI 实验室正式官宣:首届 BEHAVIOR 挑战赛将登陆 NeurIPS 2025。这是一个为具身智能量身定制的 "超级 benchmark", 涵盖真实家庭场景下最关键的 1000 个日常任务(烹饪、清洁、整理……),并首次以 50 个完整长时段任务作为核心赛题,考验机器人能否在逼真的虚拟环境中 完成真正贴近人类生活的操作。 为什么 BEHAVIOR 值得关注? 与以往碎片化的基准不同,BEHAVIOR 首次提出:一个真正的家庭机器人,必须同时具备跨房间导航、双手精细操控、长期规划与动态适应等多项能力。 任务规模前所未有:覆盖 1000 个家庭活动,50 个完整长程挑战,平均单个任务需要 6.6 分钟连续操作。 高保真环境仿真:基于 NVIDIA Omniverse 的高保真模拟器 OmniGibson,支持衣物折叠、液体倒水、加热 / 冷冻等复杂物理交互。 数据史无前例:10,000 条专家遥操作示范, ...
刚刚,Sam Altman发文,透露OpenAI正在干的大事业
机器之心· 2025-09-24 02:31
Core Insights - OpenAI is significantly expanding its computational power through a partnership with Nvidia, which involves a $100 billion investment and the deployment of at least 4 million GPUs to create a super AI infrastructure [1][3] - The company plans to establish five new AI data centers in the U.S. as part of its Stargate initiative, aiming to enhance its capacity to nearly 7 gigawatts, sufficient to power over 5 million households [1][3] - OpenAI's CEO, Sam Altman, emphasizes that robust computational power is essential for realizing the full potential of artificial intelligence, which is crucial for ensuring widespread benefits from AI advancements [3][5] Summary by Sections Investment and Infrastructure - Nvidia's investment of $100 billion will support OpenAI in building a super AI infrastructure with a focus on computational power [1] - The Stargate plan includes five new data centers, which, along with existing facilities, will increase OpenAI's planned capacity to nearly 7 gigawatts [1][3] Strategic Goals - OpenAI aims to meet its previously announced commitment of 10 gigawatts of computational power by accelerating the development of its infrastructure [1][3] - The company is focused on creating a scalable AI infrastructure that can support the training and inference needs of next-generation AI models [3][5] Vision for AI Development - Altman envisions a future where access to AI services becomes a fundamental driver of economic growth and potentially a basic human right [5][6] - The company is committed to innovating across all technological layers to achieve its ambitious goal of producing 1 gigawatt of new AI infrastructure weekly [6]