Workflow
机器之心
icon
Search documents
实测Vidu Q1参考生功能,看到诸葛亮丘吉尔拿破仑在长城拍照留念
机器之心· 2025-07-11 08:27
机器之心报道 看到这里,大概就可以看出 Vidu Q1 参考生功能的不寻常之处了。 编辑:Youli 这次真的不一样,遇到了「想象力的神」! 以前常说「要把自己活成一支队伍」,如今感谢 AI,真的实现了。 最近,生数科技旗下 AI 视频模型 Vidu Q1 推出参考生功能,极大简化传统内容生产流程,真正实现「一个人就是一个剧组」! 首先,我们来看一个视频: 这几个人物形象大家应该都很熟悉。 摇着羽扇、说着「想不到世间还有如此厚颜无耻之人」出现在各大鬼畜视频中的诸葛亮,英国铁血首相丘吉尔,以及战绩可查的拿破仑,如今他们跨越时空,围 坐在会议室中密切交谈,实现「世纪大会晤」! 如果用常规的 AI 图生视频来做的话,一般要经过写脚本、文生图 / P 图 / 融图、图片生成、图生视频、成片等步骤,但实际上,这里只用了三张图片和 Vidu Q1 的 参考生功能! 就像把大象放进冰箱只需要三步一样,这里也只需要三个步骤:找到上传照片、写提示词、成片。 更炫技的操作是,X 网友 Alex,她是一名艺术家兼程序员,在她的操作下,1989 年版本的蝙蝠侠与 1993 年版的侏罗纪公园霸王龙,不仅同框出现,还上演激烈 「对打」, ...
马斯克吹牛了吗?Grok 4第一波实测出炉:既能完虐o3,也菜到数不清6根手指
机器之心· 2025-07-11 08:27
机器之心报道 机器之心编辑部 网友氪重金体验Grok4。 昨天,马斯克亮相 Grok 4 发布会 ,一脸骄傲地表示:Grok 现在所有学科都达到博士后水平,没有例外,甚至可以在今年内实现科学新发现。 这一下子激起全球网友的兴趣,即使 Grok 4 的价格不菲,不少网友还是自愿氪金去体验一把。 他用相同的提示词对比了 Grok 4 和 o3 的生成效果。 提示词:Create a HTML, CSS, and javascript where a ball is inside a rotating hexagon. The ball is affected by Earth's gravity and friction from the hexagon walls. The bouncing must appear realistic.(创建一个包含 HTML、CSS 和 JavaScript 的项目,实现一个在旋转六边形内部的球 体,该球体受到地球引力和六边形壁摩擦力的影响,其反弹效果必须看起来逼真。 ) 可能会有小伙伴提出质疑,在往期的测试中,o3-mini 不是都能顺利完成任务吗?详见机器之心文章《 o3 ...
打破大模型编程「数据污染」与「能力虚胖」困境,Meituan-M17团队构建新一代AI编程评测新标准——OIBench
机器之心· 2025-07-11 02:43
Core Insights - The article highlights the significant gap between the proclaimed capabilities of large language models (LLMs) in programming and their actual performance in rigorous evaluations, indicating a "cognitive gap" between marketing claims and reality [3][28]. Evaluation Framework - The Meituan-M17 team developed the OIBench dataset to provide a more accurate and differentiated assessment of LLMs' programming abilities, addressing the limitations of existing evaluation systems [3][8]. - OIBench consists of 212 high-difficulty algorithm problems, specifically designed to avoid data leakage and ensure high-quality assessments [10][11]. Model Performance - The evaluation of 18 mainstream models revealed that even the top-performing model, o4-mini-high, scored only 36.35, indicating a substantial gap from human competition levels [5][19]. - Many models, such as GPT-4o and Claude 3.5 Sonnet, demonstrated low success rates on complex problems, highlighting the limitations of their capabilities [4][19]. Comparison with Human Competitors - OIBench innovatively compared model performance with that of human competitors from top universities, providing more reliable and reproducible data than traditional Elo rating systems [24][23]. - The results showed that models like o4-mini-high performed better than 42% of human competitors, but overall, many models struggled to surpass even 20% of human participants [30][31]. Future Directions - The article emphasizes the need for ongoing collaboration between academia and industry to enhance the evaluation of LLMs and their integration into real-world applications [28][34]. - The introduction of a new competition focusing on human-machine collaboration aims to bridge the gap between current evaluation methods and practical applications in software development [39].
是的,LeCun要向28岁的Alexandr Wang汇报!这是Meta新AI团队的一些独家内部消息
机器之心· 2025-07-11 02:43
Core Viewpoint - Meta's aggressive recruitment strategy in the AI sector has raised questions about its sustainability and the potential impact on company culture and performance [2][24]. Group 1: Recruitment and Team Structure - Meta has made headlines by offering exorbitant salaries, reportedly up to $200 million for key talent, to attract AI experts from competitors like OpenAI and Apple [3][4]. - The newly formed Meta Superintelligence Labs (MSL), led by Alexandr Wang, is a focal point of interest regarding its operational structure and research direction [5]. - There is a significant internal restructuring, with high-level executives being allowed to recruit their own teams, which may lead to internal competition and integration challenges [21][22]. Group 2: Internal Dynamics and Culture - Concerns have been raised about the impact of these changes on Meta's corporate culture, with reports of a "fear culture" emerging due to performance evaluations and ongoing layoffs [24]. - A lack of clear vision and strategic confusion has been noted, particularly within the Llama team, where many employees are unclear about the company's goals [24]. - The retention rate of top talent recruited from other companies is low, indicating potential issues with employee satisfaction and organizational stability [24]. Group 3: Research Focus and Distinctions - The Fundamental AI Research (FAIR) division operates independently from the GenAI and MSL teams, focusing on long-term foundational research rather than product development [8][16]. - The Llama team, initially part of FAIR, has been transitioned to the GenAI product group following the success of Llama1, highlighting the distinction between exploratory research and product-oriented development [15][16]. - The controversy surrounding the Llama 4 model, including allegations of "ranking cheating," has raised questions about Meta's technical reputation and credibility in the AI field [24].
Meta为他豪掷2亿美元,上交校友庞若鸣,晒出在苹果的最新论文
机器之心· 2025-07-10 10:49
Core Viewpoint - The article discusses Ruoming Pang's transition from Apple to Meta, highlighting his contributions to Apple's foundational model and the development of AXLearn, a modular large model training system designed for heterogeneous infrastructure. Group 1: Ruoming Pang's Transition - Ruoming Pang, head of Apple's foundational model team, is moving to Meta's newly established superintelligence team, with a reported offer of $200 million [2][3]. - Despite the transition, Pang continues to contribute to Apple by promoting his research on AXLearn [3][4]. Group 2: AXLearn Overview - AXLearn is a production-grade system designed for large-scale deep learning model training, emphasizing scalability and high performance [6]. - The system features a modular design and comprehensive support for heterogeneous hardware infrastructure, allowing for efficient integration of functionalities like Rotary Position Embeddings (RoPE) with minimal code [6][8]. - A new method for measuring modularity, based on lines of code (LoC-complexity), is introduced, showing that AXLearn maintains constant complexity during system expansion, unlike other systems that exhibit linear or quadratic growth [7][23]. Group 3: Performance Evaluation - AXLearn's training performance is compared with systems like PyTorch FSDP, Megatron-LM, and MaxText across various hardware platforms, demonstrating competitive iteration times and throughput [26][29]. - The system shows near-linear scalability in weak-scaling experiments, indicating its robustness in handling increased workloads [30]. Group 4: Production Use and Impact - AXLearn has evolved from a tool for a few developers to a large platform supporting hundreds of developers in training models with billions to trillions of parameters [35]. - It can concurrently support over 10,000 experiments and is deployed across various heterogeneous hardware clusters, contributing to features used by billions of users [36][37].
告别数据「噪音」,UCSD大模型推理新方法DreamPRM充当「信号放大器」,登顶MathVista测评榜
机器之心· 2025-07-10 10:49
Core Viewpoint - DreamPRM, developed by a research team from the University of California, San Diego, has achieved the top position on the MathVista mathematical reasoning leaderboard, showcasing its significant advancements in multimodal reasoning capabilities [1][6][22]. Summary by Sections Introduction - DreamPRM utilizes a dual-layer optimization framework to enhance the reasoning abilities of multimodal large language models (MLLMs) by addressing challenges such as data quality imbalance and distribution shift [2][12]. Methodology - The core innovation of DreamPRM lies in constructing the training process of the process reward model (PRM) as a differentiable dual-layer optimization problem, dynamically adjusting domain weights to mitigate issues in multimodal reasoning [12][22]. - The lower optimization phase trains PRM parameters across 15 diverse training domains, assigning dynamic weights to reflect each domain's contribution to the overall loss function [13][14]. - The upper optimization phase employs a carefully constructed metadata set covering 30 disciplines and 183 subfields to evaluate the generalization capability of the PRM [12][14]. Performance Results - DreamPRM has demonstrated superior performance across five benchmark tests, consistently outperforming other PRM methods by 2-3% compared to the original PRM without data selection [16][22]. - The model, with only 8 billion parameters, outperformed larger closed-source models like GPT-4v and Gemini-1.5 in most benchmarks, indicating its strong reasoning capability [16][22]. - The accuracy of DreamPRM improves as the number of candidate reasoning chains (CoTs) increases, with performance enhancements observed when applied to stronger models like GPT-4.1-mini and o4-mini [19][20]. Conclusion - DreamPRM effectively addresses the challenges of data quality imbalance and distribution shift in training multimodal process reward models, achieving notable improvements in performance, particularly in complex mathematical reasoning tasks [22].
我们用飞书开了个选题会,一下进入现代化办公,编辑部直呼:真香
机器之心· 2025-07-10 10:49
Core Viewpoint - The article discusses the recent launch of several AI-driven products by Feishu at the 2025 Future Unlimited Conference, highlighting their advancements in AI applications and tools designed to enhance productivity and efficiency in business environments [1][4]. Group 1: AI Applications - Feishu introduced the industry's first AI application maturity model, categorizing AI applications into four levels, with Feishu's Knowledge Q&A reaching the M3 standard, indicating it is mature and can be used on a large scale [7][9]. - The Knowledge Q&A tool can search through vast amounts of documents, exemplified by a case where a large enterprise created 9.4 million documents, allowing users to retrieve specific information quickly [9][10]. - The tool also features privacy protection, ensuring that encrypted data is only visible to the user [22]. Group 2: Meeting Solutions - The Feishu Meeting product has achieved the M4 standard, capable of generating intelligent meeting summaries and action items in real-time with high accuracy [23][24]. - The system can differentiate speakers during meetings, providing clear summaries and ensuring that action items are correctly attributed to the right individuals [26][32]. - A personalized report summarizing key discussion points and tasks from the past week is generated to help users track their work progress [35]. Group 3: Data Management Tools - Feishu's Multi-dimensional Table has been significantly upgraded, now supporting up to 10 million rows and offering millisecond-level calculation speeds, making it the first of its kind to handle such large data volumes [40]. - The new application mode allows users to build business systems through a drag-and-drop interface without needing coding skills, enhancing accessibility for various business applications [42]. - The Multi-dimensional Table has nearly 10 million monthly active users and is set to integrate with DingTalk and WeChat Work, breaking down ecosystem barriers [44]. Group 4: Development Tools - Feishu launched a development suite, including the first enterprise AI system building tool, Feishu Miaotai, which facilitates rapid product development using AI [45]. - The Feishu aPaaS has undergone multiple AI upgrades, supporting the development of business systems with AI integration [45]. - The introduction of AI capabilities into every workflow node allows for automated data entry and analysis, enhancing operational efficiency [46].
编码器-解码器架构的复兴?谷歌一口气发布32个T5Gemma模型
机器之心· 2025-07-10 08:35
Core Viewpoint - The article discusses the launch of Google's T5Gemma model, highlighting its potential to revitalize the encoder-decoder architecture in the context of large language models (LLMs) and its competitive performance compared to existing models [1][12]. Group 1: Model Launch and Features - Elon Musk announced the release of Grok 4 model, attracting attention from the AI community, while Google continued to update its Gemma series models [1][2]. - Google introduced a series of multimodal models for health AI development, including MedGemma 4B and 27B, which assist in diagnosis and provide medical advice [3][4]. - The T5Gemma model, based on the Gemma 2 framework, utilizes an adaptation technique to convert pre-trained decoder-only models into encoder-decoder architectures, offering various configurations and sizes [5][8][9]. Group 2: Performance and Efficiency - T5Gemma's performance is comparable to or superior to the decoder-only Gemma models, dominating the quality-efficiency trade-off in multiple benchmark tests [21][24]. - In practical applications, T5Gemma demonstrated significant advantages in tasks like GSM8K, achieving higher accuracy with similar or lower latency compared to smaller models [22][23]. - The flexibility of the adaptation method allows for the combination of different model sizes, enabling tailored solutions for specific tasks [18][19]. Group 3: Research and Development Insights - Google explored the feasibility of building top-tier encoder-decoder models from pre-trained decoder models, leading to promising results in complex reasoning tasks [15][28]. - The T5Gemma model showed substantial improvements in various benchmarks, indicating its potential to create more powerful foundational models [28][31]. - The article suggests that the advancements in T5Gemma could lead to a resurgence of encoder-decoder models in the LLM era [12][33].
7月19日,相聚北京!一起聊聊ACL 2025爆点研究
机器之心· 2025-07-10 08:35
Core Insights - The AI field continues to be an exciting area in 2025, with numerous research releases from major tech companies and institutions [1] - The rapid pace of technological advancements in AI is overwhelming, with new models and paradigms emerging almost weekly [3][4] - Developers and researchers are increasingly engaging in conferences and academic sharing to stay updated on cutting-edge research [5] Event Overview - The ACL conference, a significant event in the NLP field, received over 8,000 submissions this year, marking a historical high [6] - The ACL 2025 conference will take place from July 27 to August 1 in Vienna, Austria, featuring various activities such as keynote speeches, paper presentations, roundtable discussions, and poster sessions [6][7] - The event aims to provide a platform for domestic AI talent, with a full schedule of presentations and discussions announced [6] Keynote Speakers and Topics - The keynote address on "Trends and Outlook for ACL 2025" will be delivered by Che Wanxiang, a prominent professor from Harbin Institute of Technology [9][17] - Liu Pengfei from Shanghai Jiao Tong University will present on "Reinforcement Learning and Complex Reasoning in Large Models" [11][19] Paper Presentations - Various papers will be presented, covering topics such as the intrinsic self-correction of large language models and the acceleration of inference in large language models [9][12] - The event will also feature poster sessions and opportunities for industry engagement [21]
从「塑料人」到「有血有肉」:角色动画的物理革命,PhysRig实现更真实、更自然的动画角色变形效果
机器之心· 2025-07-10 08:35
个人主页:https://haoz19.github.io/ 作者:张昊,伊利诺伊大学香槟分校博士生,研究方向为 3D/4D 重建、生成建模与物理驱动动画。目前在 Snap 担任研究实习生,曾于 Stability AI 和上海人工智能实验室实习。PhysRig 由 UIUC 与 Stability AI 联合完成,旨在推动角色动画迈向更真实、可控的物理解法。 项目链接: https://physrig.github.io 论文链接: https://arxiv.org/abs/2506.20936 代码链接: https://github.com/haoz19/PhysRig 动画角色在动起来时,是否常常显得 「塑料感十足」?即使使用再复杂的骨骼系统,人物走路时还是像带着铰链的木偶?这是因为当前主流的绑定 (rigging)技术 —— 线性混合蒙皮(Linear Blend Skinning,简称 LBS)虽然效率高、计算方便,但在遇到柔软材质(如皮肤、脂肪、动物尾巴) 时,往往会出现体积丢失、扭曲甚至 「糖果包裹」 效应,严重影响真实感。 在 ICCV 2025 最新接收论文《PhysRig: Diffe ...