语言

Search documents
马斯克xAI发布Grok 4:训练算力提升100倍,多项测试中领先第二名一倍
Feng Huang Wang· 2025-07-10 06:20
Core Insights - xAI has launched its latest large language model, Grok 4, which shows significant performance improvements over its predecessor, Grok 3, with a 100-fold increase in training computational power [1] - Grok 4 achieved a 25% problem-solving rate in the "Humanities Last Exam" benchmark, while the multi-agent version, Grok 4 Heavy, exceeded 50% [1] - The company is focusing on enhancing multi-modal understanding capabilities and has released an API for Grok 4, supporting a context length of 256K [2] Model Performance - Grok 4 demonstrates superior reasoning capabilities in standardized tests, including GPQA and AIME, and achieved a perfect score in the Live Coding Bench test [2] - The model integrates tool usage directly into its training process, improving reliability in complex task handling [2] Commercialization Efforts - xAI has introduced a subscription service, Super Grok Heavy, allowing users to access both Grok 4 and Grok 4 Heavy [3] - The company plans to develop a dedicated programming model and initiate video generation model training using over 100,000 H200 GPUs in the coming weeks [3] - The release of Grok 4 marks a significant breakthrough in the competitive landscape of large language models, particularly in reasoning and multi-agent collaboration [3]
据The Verge:OpenAI的开源语言模型即将发布。
news flash· 2025-07-10 05:44
Core Insights - OpenAI is set to release an open-source language model [1] Group 1 - The upcoming release is expected to enhance accessibility and collaboration within the AI community [1]
马斯克发布Grok 4:叫板GPT-5,首席科学家却临阵离职
Feng Huang Wang· 2025-07-10 05:31
Core Viewpoint - Elon Musk officially launched the latest language model from his xAI team, Grok 4, amidst controversies including the resignation of xAI's chief scientist and previous issues with the model generating racist content [1][2] Group 1: Model Features and Capabilities - Grok 4 showcases significant upgrades, including multi-modal capabilities for processing text and images, with potential future support for video processing [2] - The model introduces Grok 4 Code for code writing and debugging, and enhances voice interaction for a more natural conversational experience [2] - Grok 4 will utilize a tool called DeepSearch for real-time internet searches, integrating data from the X platform to provide up-to-date information [2] - A unique feature of Grok 4 is its enhanced understanding of internet culture, slang, and memes, aiming to be a more relatable AI assistant [2] Group 2: Market Position and Challenges - Despite its powerful features, Grok 4 faces a credibility crisis due to previous versions producing biased content, raising concerns about xAI's commitment to product safety and testing [2] - Musk positions xAI as a challenger to what he refers to as "woke" AI models like ChatGPT and Gemini, yet he remains largely silent on the current controversies [2] - In contrast to competitors like OpenAI and Google, which prioritize reliability and safety, xAI opts for a more avant-garde approach with fewer restrictions, which poses risks that remain to be evaluated by the market [3]
VLA统一架构新突破:自回归世界模型引领具身智能
机器之心· 2025-07-10 04:26
Core Viewpoint - The article discusses the development of a new unified Vision-Language-Action (VLA) model architecture called UniVLA, which enhances the integration of visual, language, and action signals for improved decision-making in embodied intelligence tasks [4][5][13]. Group 1: Model Architecture and Mechanism - UniVLA is based on a fully discrete, autoregressive mechanism that models visual, language, and action signals natively, incorporating world model training to learn temporal information and causal logic from large-scale videos [5][9][14]. - The framework transforms visual, language, and action signals into discrete tokens, creating interleaved multimodal temporal sequences for unified modeling [9][10]. Group 2: Performance and Benchmarking - UniVLA has set new state-of-the-art (SOTA) records across major embodied intelligence benchmarks such as CALVIN, LIBERO, and SimplerEnv, demonstrating its strong performance advantages [18][21]. - In the CALVIN benchmark, UniVLA achieved an average score of 95.5%, outperforming previous models significantly [19]. Group 3: Training Efficiency and Generalization - The post-training stage of the world model significantly enhances downstream decision-making performance without relying on extensive action data, utilizing only vast amounts of video data for efficient learning [14][15]. - The model supports unified training for various tasks, including visual understanding, video generation, and action prediction, showcasing its versatility and data scalability [10][24]. Group 4: Future Directions - The article suggests exploring deeper integration of the UniVLA framework with multimodal reinforcement learning to enhance its perception, understanding, and decision-making capabilities in open-world scenarios [24].
ICML 2025 | 给AI装上「智能升级插件」!阿里安全-清华大学D-MoLE让模型在持续学习中动态进化
机器之心· 2025-07-10 04:26
本文第一作者为清华大学计算机系的硕士二年级研究生葛晨笛,研究方向为多模态大语言模型、自动机器学习和图机器学习。主要合作者为来自阿里巴巴集 团安全部的樊珈珮、黄龙涛和薛晖。通讯作者为清华大学的朱文武教授、王鑫副研究员。 近日,阿里巴巴集团安全部 - 交互内容安全团队与清华大学针对持续多模态指令微调的联合研究成果被机器学习顶级会议 ICML 2025 收录。本届 ICML 共收到 12,107 篇投稿,录用率为 26.9% 。 一、 研究背景 多模态大语言模型( Multimodal Large Language Models, MLLMs) 通过结合视觉、语音等模态编码器与文本生成模型,展现出处理多模态数据的强大 能力。然而,在实际应用中,预训练的 MLLM 会随着用户需求和任务类型的变化,不断面临新的适配要求。如果直接针对新任务进行微调,模型往往会出 现灾难性遗忘 ( Catastrophic Forgetting) ,即丢失之前掌握的能力。 因此,如何让 MLLM 持续地适应新任务,同时保留过去的知识,成为一个核心挑战,这一问题被称为「持续多模态指令微调」 ( Continual Multimodal In ...
扩散语言模型写代码!速度比自回归快10倍
量子位· 2025-07-10 03:19
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 谁说扩散模型只能生成图像和视频? 现在它们能高质量地写代码了,速度还比传统大模型更快! Inception Labs推出基于 扩散技术 的全新商业级 大语言模型 —— Mercury 。 Mercury突破了自回归模型 "从左到右" 逐词生成的限制,采用 "从噪声到结构化输出" 的方式,能一次性预测所有方向的token,提高了生成 速度。 这样一来,Mercury还解决了自回归"一旦生成难以回头调整"的问题。 扩散模型并不是仅考虑前面已经生成的内容,它能在生成过程中进行 动态纠错修改 ,具有更大的灵活性。 Mercury用成熟的Transformer作为神经网络基础,结合扩散技术的 并行 生成能力,既保留了大模型的兼容性,又突破了自回归模型逐词生 成的速度限制。 尽管采用了扩散技术,Mercury模型系列仍保留了 Transformer 架构。 这确保了该模型能直接复用近年来为大语言模型开发的高效训练、推理优化技术(如低阶算子优化、超参数调优工具等)。 实测数据显示,面对相同的编程任务,Mercury的代码生成速度比传统工具最多快10倍,大幅缩短了开发周期 ...
中金:如何利用大模型实时预测宏观经济指标?
中金点睛· 2025-07-09 23:59
Core Viewpoint - The article discusses the development of a real-time forecasting framework driven by large language models (LLMs) to predict macroeconomic indicators, addressing the inherent lag in traditional macroeconomic data collection and reporting processes [1][7]. Group 1: Real-time Forecasting Methods - Macroeconomic indicators typically experience delays due to the time-consuming data collection and validation processes, often resulting in the release of data in the following month or quarter [2][7]. - Three common methods for addressing the lag in macroeconomic data are outlined: 1. **Periodic Lagging Method**: Using previously published data, which is reliable but relies on linear extrapolation [8]. 2. **Dynamic Lagging Method**: Adjusting data based on historical release patterns, which also relies on linear extrapolation [8]. 3. **Real-time Forecasting Method**: Building models for real-time state predictions, which may introduce randomness [8]. Group 2: Specific Forecasting Techniques - The article details various forecasting techniques: 1. **High-Frequency Data Splitting**: Involves using dynamic high-frequency macro data to update low-frequency macro data predictions, exemplified by the GDPNow model. This method is interpretable but requires extensive domain knowledge and may lead to overfitting due to noise in high-frequency data [9]. 2. **SARIMAX Model**: A seasonal autoregressive integrated moving average model that incorporates seasonal parameters and exogenous variables to enhance predictive power. It is suitable for stable, high-frequency indicators with limited external shocks [10][14]. 3. **LLMs for Text Interpretation**: Utilizing LLMs to analyze unstructured text data (e.g., macro news, analyst reports) to generate predictive signals based on semantic relationships and logical reasoning. This method captures market reactions to sudden events more quickly than traditional models [3][15]. Group 3: Performance of Forecasting Models - The effectiveness of real-time forecasting methods is evaluated: 1. **Autoregressive Predictions**: Limited improvement in predictive accuracy for indicators with weak correlation to previous values, such as CPI month-on-month and new RMB loans. Strongly correlated indicators (≥0.8) can simply use lagged data without modeling [4][27]. 2. **LLMs Enhancements**: Significant improvements in predictive accuracy for various indicators when using LLMs, with notable increases in correlation for new RMB loans (from -0.1 to 0.9) and export amounts (from 0.37 to 0.72) [5][35]. Group 4: Conclusion and Recommendations - The article concludes with a recommended approach for real-time forecasting of lagging macroeconomic data: 1. For indicators with high correlation to previous values, use lagged data directly. 2. For stable indicators with weak trends, apply the SARIMAX model with seasonal adjustments. 3. Utilize LLMs in conjunction with news or report data for real-time predictions when other methods are unsuitable [45].
OpenAI的开放语言模型据悉最快将于下周首次亮相。
news flash· 2025-07-09 16:20
Group 1 - OpenAI's open language model is expected to debut as early as next week [1]
7月10日电,OpenAI的开放语言模型据悉最快将于下周首次亮相。
news flash· 2025-07-09 16:20
Core Insights - OpenAI's open language model is expected to debut as early as next week [1] Company Summary - OpenAI is preparing to launch its open language model, indicating a significant development in the field of artificial intelligence [1]