机器之心
Search documents
吴恩达关注的Ling-1T背后,蚂蚁Ling 2.0技术报告解密万亿模型开源配方
机器之心· 2025-10-29 07:23
Core Insights - The article highlights the launch of Ant Group's open-source model Ling-1T, which demonstrates performance close to top proprietary models despite being a non-reasoning model, indicating a significant technological shift in AI development [2][3]. Group 1: Model Performance and Comparison - Ling-1T achieved impressive benchmark scores, outperforming several leading models in various tasks, such as achieving a score of 92.19 in C-Eval and 96.87 in mbpp [2]. - The model's performance is attributed to its unique architecture and training methodologies, which blur the lines between reasoning and non-reasoning models [3]. Group 2: Technical Report and Design Philosophy - Ant Group released a comprehensive technical report titled "Every Activation Boosted," detailing the construction of a scalable reasoning-oriented model from 16 billion to 1 trillion parameters [6][7]. - The report emphasizes a systematic approach to enhancing reasoning capabilities, focusing on sustainable and scalable AI development amidst rising computational costs [8]. Group 3: Architectural Innovations - Ling-2.0 employs a highly sparse architecture with a total of 256 experts, activating only 8 per token, resulting in a remarkable 7-fold computational efficiency compared to dense models [11]. - The model's design is guided by Ling Scaling Laws, which allow for low-cost experiments to predict performance and optimal hyperparameters for large-scale models [19]. Group 4: Pre-training and Mid-training Strategies - The pre-training phase utilized a vast dataset of 20 trillion tokens, with a focus on reasoning, increasing the proportion of reasoning data from 32% to 46% [22]. - An innovative mid-training phase introduced high-quality reasoning chain data, enhancing the model's reasoning potential before fine-tuning [24]. Group 5: Reinforcement Learning Innovations - Ling-2.0 introduced a novel reinforcement learning algorithm, Linguistic-unit Policy Optimization (LPO), which optimizes at the sentence level, significantly improving training stability and generalization [36][38]. - The model also incorporates a Group Arena Reward mechanism for subjective tasks, enhancing the reliability of reward signals during training [42]. Group 6: Infrastructure and Engineering Insights - The training of Ling-1T utilized full-stack FP8 training, achieving performance comparable to BF16 while improving computational efficiency by 15% [48]. - The report candidly discusses challenges faced during training, emphasizing the importance of algorithm-system co-design for effective large-scale model training [56][57]. Group 7: Broader Implications and Future Directions - The release of Ling-2.0 is positioned as a significant contribution to the open-source community, providing a comprehensive framework for building scalable AI models [59]. - The report suggests that advancements in AI do not solely rely on computational power but can also be achieved through innovative engineering and precise predictive methodologies [60].
推理时扰动高熵词,增强LLM性能
机器之心· 2025-10-29 01:07
Core Insights - The article discusses the emerging research on test-time scaling for large language models (LLMs), highlighting the phenomenon of localized uncertainty during inference, where a small number of high-entropy words significantly impact output correctness [2][20]. Methodology - The research team from Hong Kong University of Science and Technology (Guangzhou) proposed the Minimal Test-Time Intervention (MTI), which includes two main methods: Selective CFG intervention and Lightweight negative-prompt guidance. MTI enhances the reasoning capabilities of LLMs during inference without requiring additional training [3][20]. Selective CFG Intervention - This method aims to reduce the uncertainty of high-entropy words, which often lead to instability in multi-step reasoning. The team found that errors in LLM responses were associated with higher entropy, primarily due to high-entropy words. By applying Classifier-free Guidance (CFG) to these words, the method stabilizes the reasoning process while maintaining efficiency and improving performance [7][8]. Lightweight Negative-Prompt Guidance - This approach reuses the key-value (KV) cache and injects negative prompts to save memory allocation while maintaining a better unconditional space. The team observed that traditional CFG methods required new KV caches, which reduced the efficiency of modern LLM inference accelerators. By treating the unconditional branch as a negative prompt channel, they were able to improve performance while conserving resources [9][10]. Experimental Results - The research team conducted systematic tests across various tasks, including general tasks (Winogrande, MMLU-Pro), coding tasks (Humaneval, Humaneval_plus, LiveCodeBench), and math and science tasks (GPQA-Diamond, MATH500). Results indicated that applying MTI to only 3.5% of high-entropy words on the Qwen3-14B-Reasoning model led to an average performance improvement of 1.58 across all tasks [12][20]. Analysis of Findings - The study revealed that some low-entropy words are resistant to CFG changes, as LLMs are highly confident in their outputs for these words. This indicates that not all words require CFG intervention, and the method primarily affects high-entropy words where the model lacks confidence [17][19]. Conclusion - Overall, the work demonstrates that a small number of high-entropy words can significantly influence the correctness of LLM outputs. The proposed MTI method, which includes Selective CFG intervention and Lightweight negative-prompt guidance, is easy to implement and can be integrated with modern acceleration frameworks and various decoding strategies. This approach not only enhances model performance across numerous tasks but also opens new avenues for exploring the potential of LLMs during the reasoning phase [20].
逼近5万亿美元!英伟达GTC深夜爆拉市值,Vera Rubin超级芯片首露面
机器之心· 2025-10-29 01:07
Core Insights - NVIDIA's CEO Jensen Huang outlined a vision for America's AI century during the GTC Washington event, emphasizing the company's leadership in AI infrastructure and innovation [3][4] - Following Huang's keynote, NVIDIA's stock surged by 4.98%, increasing its market capitalization by over $230 billion to approximately $4.89 trillion, nearing the milestone of becoming the first company to reach a $5 trillion valuation [1] Group 1: Vera Rubin Super Chip - The Vera Rubin super chip was unveiled, featuring a Vera CPU and two powerful Rubin GPUs, with a total of 32 LPDDR system memory slots and HBM4 video memory [8][11] - The Rubin GPU is expected to enter mass production by October 2026, with performance capabilities significantly exceeding previous models, including 50 PFLOPS FP4 performance and 288 GB of HBM4 memory [11][12] - The Rubin Ultra platform, set to launch in late 2027, will enhance performance to 15 Exaflops for FP4 inference and 5 Exaflops for FP8 training, marking a substantial increase over earlier models [14] Group 2: Shift to GPU Accelerated Computing - NVIDIA is transitioning from CPU to GPU accelerated computing, addressing the limitations of traditional computing models and leveraging parallel processing to enhance computational capabilities [15][17] - The CUDA-X library is central to this strategy, providing tools for deep learning, data science, and quantum computing, among others [17][18] Group 3: AI Native 6G Technology Stack - Huang announced the development of an AI native 6G wireless protocol stack, NVIDIA ARC, aimed at reducing reliance on foreign technology and enhancing national security [19][20] - Nokia will integrate NVIDIA's technology into its future base stations, marking a significant collaboration in advancing U.S. telecommunications [21] Group 4: Quantum Computing and NVQLink - NVIDIA introduced NVQLink, a quantum GPU interconnect technology that enables real-time CUDA-Q calls from quantum processing units with low latency [26][28] - The initiative aims to integrate quantum computing into scientific advancements, supported by collaborations with various quantum computing companies and U.S. Department of Energy laboratories [28] Group 5: Accelerating U.S. Science - NVIDIA is partnering with the U.S. Department of Energy to build seven new supercomputers, enhancing the nation's scientific discovery capabilities [30][32] - The Solstice system at Argonne National Laboratory will deploy 100,000 NVIDIA Blackwell GPUs, establishing it as the largest AI-driven scientific platform for public research [32] Group 6: AI Factories and New Job Creation - Huang emphasized the emergence of AI factories, which are designed to generate and service AI applications, leading to the creation of new job roles in AI engineering, robotics, and quantum science [33][34] - The concept of "extreme codesign" was introduced to optimize costs and improve user experience in AI infrastructure development [37] Group 7: Omniverse DSX and Digital Twins - The Omniverse DSX was launched as a blueprint for constructing and operating gigawatt-scale AI factories, highlighting the need for collaboration among hundreds of companies [40][41] - Huang showcased how companies like Foxconn and Caterpillar are utilizing digital twin technology to enhance manufacturing processes [48] Group 8: Autonomous Driving Initiatives - NVIDIA is collaborating with Uber to deploy approximately 100,000 autonomous vehicles, with plans for scaling from 2027 [49][50] - The DRIVE AGX Hyperion 10 reference architecture will support L4 level autonomous driving capabilities, integrating human and robotic drivers into a unified network [51][53]
OpenAI完成资本重组,奥特曼宣称28年实现完全自动化AI研究员
机器之心· 2025-10-29 01:07
Core Viewpoint - OpenAI has completed a capital restructuring and simplified its organizational structure, with a focus on ensuring that artificial general intelligence (AGI) benefits all of humanity through collaboration between its nonprofit and for-profit entities [1][3][6]. Group 1: Organizational Structure and Ownership - The nonprofit organization, now called OpenAI Foundation, controls the for-profit entity, OpenAI Group, which is valued at approximately $130 billion, holding a 26% stake [3][6]. - The remaining 47% of shares are held by current and former employees and investors, making the foundation one of the most resource-rich charitable organizations in history [3][6]. - OpenAI Group operates as a public benefit corporation, maintaining the same mission as the foundation, ensuring alignment between commercial success and the nonprofit's goals [7][11]. Group 2: Funding and Focus Areas - OpenAI Foundation plans to invest $25 billion in two primary areas: health and disease cures, and resilient AI technology solutions [6]. - The health initiative aims to accelerate breakthroughs in health diagnostics and treatments, including the creation of open-source health datasets [6]. - The AI resilience initiative seeks to establish a parallel resilience layer for AI, similar to cybersecurity for the internet, to maximize benefits and minimize risks [6]. Group 3: Future Developments and Collaborations - OpenAI aims to develop personal AGI that can assist users in both work and personal life, with expectations of achieving fully automated AI researchers by 2028 [15][18]. - A significant partnership with Microsoft has evolved, with Microsoft set to hold approximately 27% of OpenAI post-recapitalization, valuing the company at around $135 billion [21][22]. - The partnership includes provisions for independent innovation and growth, with Microsoft retaining exclusive IP rights until AGI is achieved [21][23].
为什么95%的智能体都部署失败了?这个圆桌讨论出了一些常见陷阱
机器之心· 2025-10-28 09:37
Core Insights - 95% of AI agents fail when deployed in production environments due to immature foundational frameworks, context engineering, security, and memory design rather than the intelligence of the models themselves [1][3] - Successful AI deployments share a common trait: human-AI collaboration design, where AI acts as an assistant rather than a decision-maker [3][21] Context Engineering - Context engineering is not merely about prompt optimization; it involves building a semantic layer, metadata filtering, feature selection, and context observability [3][12] - A well-structured Retrieval-Augmented Generation (RAG) system is often sufficient, yet many existing systems are poorly designed, leading to common failure modes such as excessive indexing or insufficient signal support [8][9] Memory Design - Memory should be viewed as a design decision involving user experience, privacy, and system impact rather than just a feature [22][23] - Effective memory design includes user preferences, team-level queries, and organizational knowledge, ensuring that AI can provide personalized yet secure interactions [27][29] Trust and Governance - Trust issues are critical for AI systems, especially in sensitive areas like finance and healthcare; successful systems incorporate human oversight and governance frameworks [18][21] - Access control and context-specific responses are essential to prevent information leaks and ensure compliance [20][21] Multi-Model Inference and Orchestration - The emerging design pattern of model orchestration allows for efficient routing of tasks to appropriate models based on complexity and requirements, enhancing performance and cost-effectiveness [32][34] - Teams are increasingly using a decision-directed acyclic graph (DAG) approach to manage model interactions, ensuring that the system can adapt and optimize over time [34] User Experience and Interaction - Not all tasks require conversational interfaces; graphical user interfaces may be more efficient for certain applications [39][40] - The ideal use of natural language processing occurs when it lowers the learning curve for complex tools, such as business intelligence dashboards [40][41] Future Directions - Key areas for development include context observability, portable memory systems, domain-specific languages (DSL), and delay-aware user experiences [43][44][46] - The next competitive barriers in generative AI will stem from advancements in memory components, orchestration layers, and context observability tools [49][52]
马斯克Grokipedia刚上线就「翻车」?被指照抄维基,中文支持一塌糊涂
机器之心· 2025-10-28 09:37
机器之心报道 机器之心编辑部 刚刚,马斯克预热了几个月的 Grokipedia 终于上线了。 简单来说,Grokipedia 是 xAI 于 2025 年 9 月提出的开源百科平台,它的诞生就是为了挑战维基百科。因为在马斯克看来,维基百科不够中立,充满了偏见,而他 要做的 Grokipedia 在准确性和中立性方面都要超越前者。 | Grokipedia von | Q Search 搜索 SEK | | --- | --- | | sourcing | 师光与以后京209 | | 主題和来源中的系统性覆盖偏差 | | | Reliability and Accuracy | Evidence of left-leaning systemic bias in articles | | 可靠性与准确性 | 文章中存在左倾系统性偏见证据 | | Empirical studies on factual correctness | | | 关于事实正确性的实证研究 | A 2024 computational analysis of over 1,000 Wikipedia biographies of polit ...
NeurIPS 2025|VFMTok: Visual Foundation Models驱动的Tokenizer时代来临
机器之心· 2025-10-28 09:37
Core Insights - The article discusses the potential of using frozen Visual Foundation Models (VFMs) as effective visual tokenizers for autoregressive image generation, highlighting their ability to enhance image reconstruction and generation tasks [3][11][31]. Group 1: Traditional Visual Tokenizers - Traditional visual tokenizers like VQGAN require training from scratch, leading to a potential space that lacks high-level semantic information and has high redundancy [4][7]. - The organization of the latent space in traditional models is chaotic, resulting in longer training times and the need for additional techniques like Classifier-Free Guidance (CFG) for high-fidelity image generation [7][12]. Group 2: Visual Foundation Models (VFMs) - Pre-trained VFMs such as CLIP, DINOv2, and SigLIP2 excel in extracting rich semantic and generalizable visual features, primarily used for image content understanding tasks [4][11]. - The hypothesis proposed by the research team is that the latent features from these VFMs can also be utilized for image reconstruction and generation tasks [4][10]. Group 3: VFMTok Architecture - VFMTok utilizes frozen VFMs to construct high-quality visual tokenizers, employing multi-level feature extraction to capture both low-level details and high-level semantics [14][17]. - The architecture includes a region-adaptive quantization mechanism that improves token efficiency by focusing on consistent patterns within the image [18][19]. Group 4: Experimental Findings - VFMTok demonstrates superior performance in image reconstruction and autoregressive generation compared to traditional tokenizers, achieving better reconstruction quality with fewer tokens (256) [23][28]. - The convergence speed of autoregressive models during training is significantly improved with VFMTok, outperforming classic models like VQGAN [24][26]. Group 5: CFG-Free Performance - VFMTok shows consistent performance with or without CFG, indicating strong semantic consistency in its latent space, which allows for high-fidelity class-to-image generation without additional guidance [33]. - The reduction in token count leads to approximately four times faster inference speed during the generation process [33]. Group 6: Future Outlook - The findings suggest that leveraging the prior knowledge from VFMs is crucial for constructing high-quality latent spaces and developing the next generation of tokenizers [32]. - The potential for a unified tokenizer that is semantically rich and efficient across various generative models is highlighted as a future research direction [32].
NeurIPS 2025 | 北大联合小红书提出Uni-Instruct:ImageNet单步生图FID进入1.0时代!
机器之心· 2025-10-28 06:29
近年来,单步扩散模型因其出色的生成性能和极高的推理效率,在图像生成、文本到视频、图像编辑等领域大放异彩。目前主流的训练方法是通过知识蒸馏,最 小化学生模型与教师扩散模型之间的分布差异。然而,现有的方法主要集中在两条平行的理论技术路线上: 这两条路线似乎在理论上是割裂的。那么,我们能否将它们统一在一个共同的理论框架下?如果可以,这个统一的框架能否带来更强的模型性能? 论文标题:Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction 论文单位:北京大学未来技术学院、国家生物医学成像中心、北京大学前沿交叉学科研究院、小红书 hi-lab 基于 KL 散度最小化 的方法( 如 Diff-Instruct[1],DMD[2] 等):收敛速度快,但可能存在模式崩溃问题,进而导致生成性能差。 基于分数散度最小化 的方法(如 SIM[3],SiD[4] 等):蒸馏性能更好,但训练收敛较慢。 来自北京大学、小红书 hi lab 等机构的华人研究者共同提出了名为 Uni-Instruct 的 单步生成大一统理论 ...
大模型优秀大脑齐聚硬核开源聚会,SGLang社区举办国内首次Meetup
机器之心· 2025-10-28 06:29
Core Insights - The Pytorch Conference 2025 showcased the vibrant community and significant developments in deep learning, particularly highlighting SGLang's contributions and potential in the industry [1][3][4]. SGLang Overview - SGLang, an open-source high-performance inference engine for large language models and visual language models, originated from RadixAttention and is incubated by the non-profit organization LMSYS. It offers low latency and high throughput inference across various environments, from single GPUs to large distributed clusters [7][8]. Community Engagement - The first Meetup event in Beijing, co-hosted by SGLang, Meituan, and Amazon Web Services, attracted numerous contributors, developers, and scholars, indicating a strong community presence and development potential [4][8]. Technical Developments - The Meetup featured technical discussions on SGLang's architecture, including advancements in KV Cache, Piecewise CUDA Graph, and Spec Decoding, aimed at improving efficiency and compatibility [21][22]. - SGLang's quantization strategies were also discussed, focusing on expanding application range and optimizing model performance [34][35]. Application and Practice - Various industry applications of SGLang were presented, including its integration with Baidu's Ernie 4.5 model for large-scale deployment and optimization in search scenarios [41][42]. - The application of SGLang in WeChat's search function was highlighted, emphasizing the need for high throughput and low latency in user experience [44]. Future Directions - The roadmap for SGLang includes further integration with various hardware and software solutions, aiming to enhance stability and compatibility across different platforms [22][35]. - The Specforge framework, developed by the SGLang team, aims to accelerate large language model inference and has been adopted by major companies like Meituan and NVIDIA [57][58].
LSTM之父Jürgen再突破,「赫胥黎-哥德尔机」让AI学会自己进化
机器之心· 2025-10-28 06:29
编辑:冷猫、陈陈 实现通用人工智能的一大终极目标就是创建能够自我学习,自我改进的人工智能体。 这个目标已经是老生常谈了。其实在 2003 年,能够自我改进的智能体的理论模型就已经由著名的「现代 AI 之父」Jürgen Schmidhuber 提出,称为哥德尔机。 哥德尔机是一种自我改进型通用智能系统理论模型,设计灵感来自于哥德尔(Kurt Gödel)的不完备性定理。它的核心思想是:机器能够像数学家一样,通过形式 证明自身程序的改进在长期内将带来更高收益,然后安全地修改自己。 机器之心报道 Jürgen Schmidhuber 是一名德国计算机科学家,以人工智能、深度学习和人工神经网络领域的成就而知名,现任达勒・莫尔人工智能研究所(IDSIA)联合主任, 阿卜杜拉国王科技大学人工智能研究院院长。 1997 年,Jürgen Schmidhuber 发表了长短期记忆网络(LSTM)论文。2011 年,Jürgen Schmidhuber 在 IDSIA 的团队 GPU 上实现了卷积神经网络(CNN)的显著加 速,这种方法基于杨立昆等人早期提出的 CNN 设计 ,已成为计算机视觉领域的核心。 通俗来说,就是一个 ...