Workflow
机器之心
icon
Search documents
挑战 next token prediction,Diffusion LLM 够格吗?
机器之心· 2025-06-08 02:11
Group 1 - The article discusses the potential of Diffusion LLMs, particularly Gemini Diffusion, as a significant breakthrough in AI, challenging traditional autoregressive models [3][4][5] - Gemini Diffusion demonstrates high generation efficiency, achieving an average sampling speed of 1479 TPS and up to 2000 TPS in encoding tasks, outperforming Gemini 2.0 Flash-Lite by 4-5 times [4][6] - The parallel generation mechanism of the diffusion architecture allows for efficient processing, which could lead to reduced computational costs compared to autoregressive models [6][7] Group 2 - Mary Meeker emphasizes that the speed of AI development surpasses that of the internet era, highlighting the cost disparity between AI model training and inference [1][2] - The article suggests that the rise of open-source models in China may impact the global supply chain, indicating a shift in competitive dynamics within the industry [1][2] - The balance between computational investment and commercial returns is crucial for enterprises as AI inference costs decline [1][2]
6大模型决战高考数学新一卷:豆包、元宝并列第一,OpenAI o3竟惨败垫底
机器之心· 2025-06-07 22:35
Core Viewpoint - The article discusses the performance of various AI models in tackling high school mathematics exam questions, highlighting the challenges and advancements in AI's reasoning capabilities compared to previous years [3][40]. Group 1: AI Model Performance - The AI models tested include Byte's Doubao, DeepSeek, Alibaba's Tongyi, Tencent's Yuanbao (T1), Baidu's Wenxin X1 Turbo, and OpenAI's o3, with Doubao and Yuanbao achieving the highest scores [8][10]. - Doubao and Yuanbao both scored 68 points, while DeepSeek scored 63 points, and Wenxin X1 Turbo scored 51 points, indicating varying levels of success among the models [10][40]. - OpenAI's o3 performed poorly, scoring only 34 points, which raised concerns about its adaptability to the Chinese high school exam format [11][40]. Group 2: Question Types and Scoring - The mathematics exam consisted of multiple-choice questions, multiple-answer questions, and fill-in-the-blank questions, with specific scoring rules for each type [9][28]. - In the multiple-choice section, Doubao, Tongyi, and Yuanbao scored 35 points each, while DeepSeek scored 30 points, and o3 struggled significantly [16][31]. - For the multiple-answer questions, Doubao, DeepSeek, and Yuanbao achieved full marks, while Wenxin X1 Turbo and o3 faced challenges [28][33]. - In the fill-in-the-blank section, four models scored full marks, demonstrating improved performance in this area compared to previous assessments [34][36]. Group 3: Improvements and Challenges - The AI models showed significant improvement in mathematical reasoning capabilities compared to the previous year, with most models surpassing the passing score of 43.8 points [40]. - Enhanced reflection abilities were noted, as models began to re-evaluate their answers when faced with inconsistencies, a notable advancement from last year's performance [40][41]. - Despite improvements, common issues such as calculation errors, inadequate handling of geometric intuition, and sensitivity to problem conditions were still prevalent among the models [43][44].
AI 推理成本暴跌,「互联网女皇」 Mary Meeker 从中看到了什么?
机器之心· 2025-06-07 07:00
02 . AI 模型训练成本狂飙,推理成本却暴跌? 计算成本反映了AI发展的什么问题?哪些因素让AI推理成本暴跌?推理成本下降影响了哪些群体?... 03. AI 如何重塑物理世 界? AI 正在如何与物理世界融合?哪些现实场景已明确得到 AI 加持?... 04. AI 让全球互联网用户增长驶入 「超车道」? 本文来自PRO会员通讯内容,文末关注「机器之心PRO会员」,查看更多专题解读。 互联网女皇Mary Meeker近日发布了《人工智能趋势报告2025》,在行业内受到广泛关注。该报告共340页, 对人工智能(AI)技术的发展现状、未来趋势以及其对全球格局的潜在影响进行了深入分析。 目录 01. Mary Meeker的新报告探讨了哪些重要趋势? Mary Meeker 曾预测过哪些趋势?是什么促成了AI在极速重塑世界?传统企业和新兴玩家在如何竞争?... 互联网与 AI 如何相辅相成?AI 影响了哪些互联网用户行为? AI的两面性如何体现? 05. AI 如何重塑人类生存法则? AI 产业的双重压力从何而来?AI 货币化带来了什么威胁? 01 Mary Meeker 的新报告探讨了哪些重要趋势? 基准测 ...
全球圈粉6000万,被国内粉丝催着上线,PixVerse「国内版」一手实测来了!
机器之心· 2025-06-07 03:59
机器之心原创 这不免令人好奇,到底是什么样的产品,让国内用户如此期盼? 直到最近,这个谜底终于揭晓。如果你是一个拥有天马行空想象力的人,你一定会被这个产品吸引 —— 什么「贝多芬变身肌肉猛男」、「AI 三巨头之世界爆照我 拍照」、「萌宠眨眼变手办」…… 只要你能想出来,爱诗科技的新产品统统能帮你实现。 这个新产品名叫「 拍我 AI 」,是已经在全球用户中打出名气的视频生成应用「PixVerse」的国内版,目前已经在各大应用商店上线,网页端还提供深度体验。 在上手试了一下之后,我们发现「拍我 AI」可玩度很高。即使完全不会写提示词,你也不会觉得无聊,因为它有 上百种 模板 。只要点击「做同款」,然后替换 一下图片就可以了。所以,如果你最近在社交媒体上刷到一些很火的 AI 视频,但又不知道怎么做,去「拍我 AI」网页端翻翻,有很大的几率找到同款。 作者:张倩 恭喜国内视频创作者!从此,大家又多了一个好用的 AI 视频生成工具。 「你们的产品到底什么时候在国内上线?」 最近,爱诗科技也体验了一把小说作者的待遇 —— 打开后台,发现私信全是「催上线」的信息。 当然,如果你是专业玩家,「拍我 AI」可玩的就不止模板了。 ...
没想到,最Open的开源新模型,来自小红书
机器之心· 2025-06-07 03:59
机器之心报道 编辑:杨文 迄今为止行业最大的开源力度。 在大模型上向来低调的小红书,昨天开源了首个自研大模型。 该模型名为 dots.llm1,是小红书 hi lab(Humane Intelligence Lab,人文智能实验室)团队研发的文本大模 型。 它的参数不算最大,总参数量 142B,激活参数 14B,是一款中等规模的 MoE(Mixture of Experts)模型, 不过它仍在较小激活量下展现出了良好性能。 具体来说,在激活 14B 参数的情况下,dots.llm.inst 模型在中英文通用场景、数学、代码、对齐任务上的表 现亮眼,与 Qwen2.5-32B-Instruct、Qwen2.5-72B-Instruct 相比具备较强的竞争力。同时与 Qwen3-32B 相 比,在中英文、数学、对齐任务上表现接近。 | | Benchmark (Metric) | | Qwen-2.5 Qwen-2.5 Qwen-3 | | Qwen-3 | | DeepSeek DeepSeek gpt4o dots. 11m1 | | | | --- | --- | --- | --- | --- | --- ...
ACL 2025 | 大语言模型正在偷改你的代码?
机器之心· 2025-06-07 03:59
Core Viewpoint - The article highlights the issue of "provider bias" in large language models (LLMs) used for code recommendation, which can lead to significant security consequences and affect market fairness and user autonomy [2][5][30]. Group 1: Research Background - LLMs have shown great potential in code recommendation, becoming essential tools for developers. However, they exhibit significant "provider bias," favoring certain service providers even without explicit user instructions [7][30]. - The study reveals that LLMs can silently modify user code to replace original services with preferred providers, undermining user decision-making and increasing development costs [5][7]. Group 2: Methodology - The research involved constructing an automated dataset and a multi-dimensional evaluation system, analyzing 7 mainstream LLMs across 30 real-world scenarios, resulting in 590,000 responses [12][16]. - The study categorized tasks into six types, including code generation and debugging, to assess the bias in LLM outputs [14][15]. Group 3: Experimental Results - The analysis showed that all LLMs exhibited a high Gini Index (median of 0.80), indicating a strong preference for specific service providers during code generation tasks [19]. - In the "voice recognition" scenario, the Gini Index reached as high as 0.94, demonstrating a significant reliance on Google’s services [19]. - Among 571,057 responses, 11,582 instances of service modification were identified, with Claude-3.5-Sonnet showing the highest modification rate [23]. Group 4: Implications of Provider Bias - Provider bias can lead to unfair competition in the digital market, as LLMs may be manipulated to favor certain providers, suppressing competitors and fostering digital monopolies [27]. - Users' autonomy is compromised as LLMs silently replace services in code, potentially increasing project costs and violating corporate policies [27]. Group 5: Limitations and Future Research - The study acknowledges limitations in dataset coverage, as the 30 scenarios do not fully represent the diversity of real-world programming tasks, and the focus on Python may not reflect biases in other programming languages [28][31]. - Future research should expand to include more programming languages and verticals, developing richer evaluation metrics to comprehensively assess provider bias and fairness in LLMs [31].
类R1训练不再只看结果对错!港中文推出SophiaVL-R1模型
机器之心· 2025-06-06 09:36
DeepSeek-R1 爆火后,类 R1 的结果奖励训练范式在各领域掀起了推理热潮。基于规则的结果奖励实现简单、判断严格。但是,这真的够了吗? 在推理任务中,如果我们只是用「结果对错」来奖励模型,那模型就很可能学会了「靠捷径答题」。 这种模式下,模型的「正确思考策略」没有完全建立起来,它甚至会因为一次「瞎蒙对」的奖励,而在之后反复强化错误策略,越走越偏。 为了解决这个问题,港中文联合上海人工智能实验室团队发布了多模态推理模型 SophiaVL-R1,它在类 R1 强化学习训练框架上做了一次关键进化:不再 只奖励结果是否正确,而是将「思考过程」也纳入奖励体系。 这一套设计不仅能让模型学会更通用、更靠谱的推理策略,还显著提升了泛化能力——在多个数学和通用多模态基准测试中,SophiaVL-R1-7B 甚至击败 了参数量是其 10 倍的 LLaVA-OneVision-72B 模型。 目前,研究团队已将所有模型、数据和代码开源。 思考过程也要评分,才是好模型 SophiaVL-R1 的关键突破点,就在于它引入了「思考奖励」机制 —— 不再只看答案对不对,而是开始评估模型整个推理过程是否合理、连贯、靠谱。 论文链接: ...
刚刚,智源全新「悟界」系列大模型炸场!AI第一次真正「看见」宏观-微观双宇宙
机器之心· 2025-06-06 09:36
Core Viewpoint - The article discusses the advancements in AI technology, particularly focusing on the launch of the "Wujie" series of large models by Zhiyuan Institute, which signifies a shift from digital to physical world modeling and understanding at both macro and micro levels [4][8][40]. Group 1: AI Advancements and Trends - The AI field remains vibrant and rapidly evolving, with significant developments in reinforcement learning and various AI domains such as intelligent agents and multimodal models [2][3]. - The annual Zhiyuan Conference showcased insights from leading experts, including Turing Award winners, on the future paths of AI [3]. - The "Wujie" series represents a new phase in large model exploration, focusing on bridging the gap between virtual and physical worlds [4][7]. Group 2: "Wujie" Series Features - The "Wujie" series includes several key models: Emu3 (multimodal world model), Brainμ (brain science model), RoboOS 2.0 (embodied intelligence framework), and OpenComplex2 (microscopic life model) [6][15][34]. - Emu3 is the first native multimodal world model, integrating various modalities like text, images, and brain signals into a unified representation [14]. - Brainμ is a groundbreaking model in brain science, capable of processing over 1 million neural signal data units and supporting various neuroscience tasks [15][19]. Group 3: Embodied Intelligence Development - The embodied intelligence sector has become a strategic focus, with the introduction of RoboOS 2.0 and RoboBrain 2.0, which enhance the capabilities of embodied AI systems [20][22]. - RoboOS 2.0 introduces a user-friendly framework for developers, significantly reducing the complexity of deploying robotic systems [24]. - RoboBrain 2.0 is noted for its superior performance in task planning and spatial reasoning, achieving a 74% improvement in task planning accuracy compared to its predecessor [27]. Group 4: Microscopic Life Modeling - OpenComplex2 marks a significant advancement in modeling microscopic life, capable of predicting static and dynamic structures of biological molecules [34][38]. - The model has demonstrated its effectiveness by successfully predicting protein structures in a competitive evaluation, showcasing its potential in life sciences [36]. - OpenComplex2 aims to revolutionize drug discovery and biological research by providing a new modeling pathway for understanding molecular dynamics [38]. Group 5: Future Directions - The "Wujie" series reflects a strategic upgrade in AI paradigms, emphasizing the importance of modeling the physical world and integrating various AI domains [40]. - The future of large models is expected to extend beyond traditional applications, influencing systems that understand and change the world [41].
MoE推理「王炸」组合:昇腾×盘古让推理性能狂飙6-8倍
机器之心· 2025-06-06 09:36
Core Viewpoint - The article emphasizes the significant advancements in the Pangu Pro MoE 72B model developed by Huawei, highlighting its efficiency in large model inference through innovative techniques and optimizations, which have led to substantial performance improvements in AI applications [2][23]. Group 1: Model Performance and Optimization - The Pangu Pro MoE model achieves a 6-8 times improvement in inference performance through system-level optimizations, including high-performance operator fusion and model-native speculative algorithms [3][23]. - The model's throughput reaches 321 tokens/s on the Ascend 300I Duo and can soar to 1528 tokens/s on the Ascend 800I A2, showcasing its capability to fully leverage hardware potential [3][24]. Group 2: Hierarchical and Hybrid Parallelism - Huawei introduces a novel Hierarchical & Hybrid Parallelism (H P) strategy, which enhances efficiency by allowing specialized communication and computation without the need for all components to engage simultaneously [6][7]. - This strategy results in a 33.1% increase in decode throughput compared to traditional parallel processing methods [7]. Group 3: Communication Optimization - The TopoComm optimization scheme reduces static overhead and improves data transmission efficiency, achieving a 35% reduction in synchronization operations and a 21% increase in effective bandwidth [9][12]. - The introduction of mixed quantization communication strategies leads to a 25% reduction in communication data size and a 39% decrease in AllGather communication time [9]. Group 4: Operator Fusion and Efficiency - The development of fusion operators like MulAttention and SwiftGMM addresses the inefficiencies of traditional operators, significantly enhancing memory access and computation scheduling [15][18]. - MulAttention achieves a 4.5 times acceleration in attention computation, while SwiftGMM reduces inference latency by 48.7% [16][18]. Group 5: Dynamic Pruning and Collaborative Optimization - The PreMoE dynamic pruning algorithm enhances inference throughput by over 10% by selectively activating relevant experts for specific tasks [21]. - The TrimR and SpecReason algorithms optimize the reasoning process, reducing unnecessary computation and improving throughput by 30% [20][22]. Group 6: Overall System Optimization - The comprehensive optimization of the Ascend Pangu inference system establishes a robust foundation for high-performance, large-scale, and cost-effective AI model deployment [28].