Workflow
量子位
icon
Search documents
超大模型推理加速2.18倍!SGLang联合美团技术团队开源投机采样训练框架
量子位· 2025-07-26 09:01
Core Viewpoint - SpecForge is an open-source training framework designed for speculative sampling, specifically tailored for large models, achieving a 2.18x inference acceleration [1][15]. Group 1: SpecForge Overview - SpecForge is developed by the SGLang team in collaboration with Meituan's search recommendation platform and Cloudsway.AI [1]. - The framework is built to address the challenges posed by the increasing size of models, which often leads to lower inference efficiency [4][6]. - SpecForge integrates deeply with the SGLang inference engine, providing a seamless training and inference process for speculative sampling [5][7]. Group 2: Technical Features - The framework incorporates Eagle3, an advanced speculative sampling method that enhances inference speed by training a lightweight draft model to predict token distributions accurately [7]. - SpecForge supports various mainstream models, including complex MoE layers and Transformer variants, ensuring broad applicability [7]. - It features scalable distributed training through Fully Sharded Data Parallel (FSDP) and Tensor Parallelism (TP), optimizing resource utilization on GPU clusters [7][14]. Group 3: Training Modes and Efficiency - SpecForge offers two training modes: Online and Offline, allowing users to choose based on their specific needs and resource availability [10][17]. - The Training-Time Test (TTT) architecture enhances the robustness of the draft model, encapsulating complex processes to simplify implementation for users [9]. - The framework is designed with a focus on memory-efficient training, significantly reducing memory overhead even for trillion-parameter models [7]. Group 4: Experimental Validation - The effectiveness of SpecForge was validated through experiments on datasets like ShareGPT and UltraChat, demonstrating compatibility with the Eagle3 architecture [15]. - The draft models trained using SpecForge achieved a notable 2.18x inference acceleration on the MT-Bench benchmark [15]. Group 5: Future Developments - SpecForge's roadmap includes plans to support additional model architectures and integrate visual-language models (VLM) into the framework [22]. - The team aims to enhance training efficiency through improved parallel strategies and kernel optimizations [22].
80万人排队求码后,Lovart功能升级放开用!果然是顶流设计Agent,第一天鲨疯了
量子位· 2025-07-26 07:33
Core Viewpoint - Lovart has officially launched its platform, which features a new gameplay called "ChatCanvas," allowing users to collaborate with AI designers on a shared canvas, enhancing creativity and productivity in design tasks [2][4][10]. Group 1: Product Features - Lovart's "ChatCanvas" is described as a variant of "Figma + Notion + ChatGPT," enabling users to create and modify designs interactively with AI [4][59]. - Users can generate designs by simply inputting commands, and the platform provides multiple design options based on user feedback [15][66]. - The platform supports batch processing, allowing users to make multiple modifications at once, enhancing efficiency [31][32]. Group 2: User Experience - Lovart emphasizes user control and communication, ensuring that the AI understands user needs before executing design changes [32][60]. - The system creates independent "Chat Frames" for different tasks, preventing context confusion and maintaining organization [52][54]. - Users can seamlessly switch between tasks and view all project outputs in one place, including images, videos, and layouts [56][58]. Group 3: Market Position and Trends - Lovart addresses a significant pain point in the creative industry by providing an all-in-one solution that automates the entire design process, allowing individual creators to function like a professional design team [81][82]. - The platform's timing aligns with a shift in the AI market from model development to practical applications, highlighting the demand for usable AI tools [84]. - Lovart's approach represents a fundamental shift from user experience (UX) to agent experience (AX), focusing on relationships and continuous learning from user interactions [84]. Group 4: Team and Background - Lovart is developed by a Chinese team, with key figures including Wang Haofan and Chen Mian, showcasing China's strength in both foundational AI model development and innovative applications [87][89].
非Transformer架构落地之王,带着离线智能和原生记忆能力在上海WAIC浮出水面
量子位· 2025-07-26 06:34
Core Viewpoint - The article discusses the advancements made by RockAI in developing a new AI model architecture that operates offline and possesses memory capabilities, marking a significant shift from traditional Transformer-based models [6][10][12]. Group 1: RockAI's Innovations - RockAI has introduced the Yan 2.0 Preview model, which features a "native memory" capability allowing it to learn and evolve continuously through user interactions [11][12]. - The model operates entirely offline, demonstrating effective performance in tasks such as learning new actions and playing games without external control [6][8][11]. - The architecture is designed specifically for edge devices, enabling efficient operation without relying on cloud resources, which is crucial for devices with limited computational power [30][48]. Group 2: Memory Mechanism - Yan 2.0 Preview incorporates a memory module that allows for dynamic updating and retrieval of information, enabling the model to forget outdated knowledge while integrating new insights [20][23]. - The model's memory retrieval mechanism selects the most relevant memories to generate outputs, enhancing its reasoning capabilities [23][24]. - This approach contrasts with traditional models that are static and unable to learn post-deployment, positioning Yan 2.0 as a more adaptive and intelligent system [14][17]. Group 3: Market Position and Future Directions - RockAI is positioned as a leader in the non-Transformer architecture space, having successfully deployed its models on various edge devices without the need for model compression or quantization [58][60]. - The company aims to create a collective intelligence framework where multiple models can collaborate and evolve, moving towards a more decentralized AI ecosystem [65][66]. - The shift away from Transformer models is seen as a response to the limitations of current architectures, with RockAI advocating for simpler algorithms that require less computational power and data [28][37].
开源Qwen一周连刷三冠,暴击闭源模型!基础模型推理编程均SOTA
量子位· 2025-07-26 05:06
Core Insights - The article highlights the rapid advancements in open-source AI models, particularly focusing on the Qwen3 series, which has achieved significant milestones in performance and capabilities [1][2][3]. Group 1: Model Performance - The newly released Qwen3-235B-A22B-Thinking-2507 model has been recognized as the "strongest open-source model globally," surpassing top closed-source models like Gemini-2.5 Pro and o4-mini [3][7]. - In the "final exam for humans," the latest model scored 18.2, an improvement from 11.8 in the previous version, outperforming competitors such as DeepSeek-R1-0528 and OpenAI o4-mini [13][14]. - The Qwen3 series has achieved state-of-the-art (SOTA) results in various benchmarks, including MMLU-Pro, GPQA, and LiveCodeBench, demonstrating superior performance in knowledge, reasoning, and programming tasks [11][16][32]. Group 2: Open-Source Impact - The rapid release of three models in a short period has positioned Qwen3 as a leader in the open-source AI landscape, with significant interest and usage reflected in API call volumes exceeding 100 billion tokens [6][31]. - The article emphasizes that the advancements in open-source AI, particularly from Chinese companies like Alibaba, are reshaping the global landscape, with Qwen models surpassing previous leaders like the Llama series [33][37]. - Alibaba plans to invest over 380 billion yuan in cloud and AI hardware infrastructure over the next three years, indicating a strong commitment to enhancing its AI capabilities [38]. Group 3: Industry Recognition - The achievements of the Qwen3 series have garnered attention from industry leaders, with discussions highlighting the success of open-source models and their potential to challenge established closed-source counterparts [29][36]. - The article notes that the speed of development in China's open-source AI sector is rapidly closing the gap with closed-source models, suggesting a shift in the competitive landscape [39][40].
斯坦福大模型推理课免费了,谷歌推理团队创始人主讲
量子位· 2025-07-25 07:59
Core Viewpoint - The article discusses the reasoning capabilities of large language models (LLMs) and emphasizes the importance of intermediate reasoning steps in enhancing model confidence and accuracy in problem-solving [5][10][34]. Group 1: Importance of Reasoning in LLMs - Reasoning in LLMs refers to the intermediate thought processes that occur before arriving at a final answer, which can significantly improve the model's ability to solve complex problems [5][11]. - Introducing a chain of thought (CoT) allows LLMs to tackle inherently serial problems without needing to expand the model size, thus bridging the gap between Transformers and Turing machines [12][13]. - The presence of reasoning steps increases the accuracy and reliability of answers, reducing the likelihood of random guessing [14][17]. Group 2: Enhancing Model Confidence - Answers derived from reasoning processes lead to greater confidence in the model's outputs, as they are based on logical deductions rather than mere guesses [19][20]. - Denny Zhou highlights that pre-trained models possess reasoning capabilities even without fine-tuning, although these outputs may not be prioritized in greedy decoding [21][24]. Group 3: Methods to Improve Reasoning - The CoT-decoding method selects reasoning paths from top-k alternatives, enhancing performance on reasoning tasks and approaching the effectiveness of instruction-tuned models [26]. - Supervised fine-tuning (SFT) involves training models on human-written step-by-step problems, but it may lack generalization across new scenarios [27][28]. - Reinforcement learning fine-tuning has emerged as a powerful method for eliciting reasoning, focusing on generating longer responses and improving model performance through iterative training [31]. Group 4: Future Directions - Denny Zhou identifies key areas for future breakthroughs, including addressing tasks with non-unique verifiable answers and developing practical applications beyond benchmark testing [35][40].
不怕被挖!谷歌晒IMO金牌团队大合照,还挨个圈出了联系方式
量子位· 2025-07-25 07:59
Core Viewpoint - Google DeepMind is actively responding to competitive pressures, particularly from Meta, as it prepares for the International Mathematical Olympiad (IMO) 2025, showcasing its team and achievements despite recent talent losses to competitors [2][3][4]. Group 1: Team Dynamics and Competitor Actions - Google recently won an IMO gold medal, but Meta quickly recruited three core team members from DeepMind [2][3]. - The DeepMind team, led by Thang Luong, publicly shared a team photo, which can be seen as both a response to Meta's actions and a display of confidence [3][4]. - Notably, the three individuals recruited by Meta were absent from the team photo, indicating a potential rift or shift in team dynamics [8][17]. Group 2: Preparation for IMO - In the lead-up to the IMO 2025, DeepMind's scientists gathered from various locations, including Mountain View, New York, and Singapore, to finalize their preparations [11]. - Thang Luong emphasized that the week leading up to the competition was crucial for achieving significant breakthroughs [11]. - The team integrated their previous research and methodologies to conduct an intensive training session, which was described as a "legendary" effort [10][11]. Group 3: Technical Achievements - The team completed the final training of the Gemini Deep Think model just two days before the IMO, achieving peak performance [13]. - The model demonstrated impressive capabilities not only in mathematical reasoning but also in code generation and other complex reasoning tasks [14]. Group 4: Key Team Members - The recently announced IMO gold medal team consists of 16 members, including four Chinese members, while the three who left for Meta are not included [17]. - Yi Tay, a co-leader of the Deep Think IMO team, has a strong background in major Google models and previously left to start a company but returned due to personal circumstances [21][25]. - Other notable team members include Quoc Le, a co-founder of Google Brain, and several researchers with prestigious academic backgrounds from institutions like MIT and Stanford [27][29][41].
WAIC抢先爆料:金融“黑马”大模型超DeepSeek刷新SOTA,论文已上线
量子位· 2025-07-25 05:38
Core Viewpoint - The article discusses the advancements in AI models, particularly focusing on Ant Group's financial reasoning model, Agentar-Fin-R1, which aims to address specific challenges in the financial sector and achieve state-of-the-art (SOTA) performance in various benchmarks [1][4][56]. Group 1: Model Overview - Ant Group's financial reasoning model, Agentar-Fin-R1, has two parameter versions: 8B and 32B [10]. - The model is designed to tackle industry-specific challenges in financial applications, such as data quality, hallucination, and compliance [13][16]. - Agentar-Fin-R1 has achieved top performance across all financial evaluation benchmarks, surpassing other large-scale models like GPT-o1 and DeepSeek-R1 [14][53]. Group 2: Technical Innovations - The model incorporates a more specialized financial task data labeling system, allowing it to function as an "expert" from the outset [20][21]. - It employs an efficient weighted training algorithm to significantly lower the application barrier for large models [20]. - The training process includes a two-phase strategy: initial comprehensive knowledge injection followed by targeted reinforcement learning on challenging tasks [34][35]. Group 3: Evaluation Standards - Ant Group introduced a new evaluation benchmark, Finova, to assess the model's effectiveness in real-world financial scenarios [38][41]. - Finova evaluates models based on agent execution capabilities, complex reasoning abilities, and safety compliance, consisting of 1,350 financial problems [41][52]. - The introduction of Finova aims to provide a more rigorous assessment compared to existing financial evaluation sets, which are considered too simplistic [39][51]. Group 4: Industry Impact - Ant Group has a deep understanding of the financial sector, having served 100% of state-owned banks and over 60% of city commercial banks, which enhances its model's relevance [58][60]. - The Agentar brand serves as a window for Ant Group's AI practices in finance, linking numerous financial institutions to scale the application of large models [60][61]. - The advancements in Agentar-Fin-R1 reflect Ant Group's accumulated industry insights, data, and AI capabilities [61].
AGI是否需要世界模型?顶级AI专家圆桌论道,清华求真书院主办
量子位· 2025-07-25 05:38
作者:胥森哲 投稿:基础科学与人工智能论坛 2025年7月20日, 2025基础科学与人工智能论坛 在 中关村展示中心会议中心 举行。 清华大学人工智能研究院常务副院长 孙茂松 教授主持, 北京中关村学院院长、中关村人工 智能研究院理事长 刘铁岩 博士 、清华大学电子工程系主任 汪玉 教授 、美国纽约州立大学 石溪分校 顾险峰 教授 、曦智科技创始人兼CEO 沈亦晨 博士 四位顶尖AI专家齐聚一堂, 围绕人工智能的根本问题展开了一场火花四溅的讨论。 从因果性到原创能力,从算力瓶颈到未来架构,这场论坛为我们勾勒出AI技术的前沿图景与 未来挑战。 共有 500名 来自清华大学求真书院、北京各高校、中学和科研机构的观众到场。 此外,本次论坛还是2025国际基础科学大会的特别活动之一,由清华大学求真书院主办,中 信证券股份有限公司与中关村科学城管理委员会协办。 现已连续举办3届,旨在推动交叉学科合作交流,为青年学生搭建了了解科研前沿、激发探 索兴趣的平台。 我们将论坛交流内容整理如下: 相关性≠因果性:AI系统的科学化门槛仍在 面对 人工智能技术发展的瓶颈与限制 , 顾险峰 教授指出,目前AI技术在方法上仍以"相关 ...
GitHub官方版AI IDE公测!用自然语言写App,全栈应用1分钟生成
量子位· 2025-07-25 05:38
Core Viewpoint - GitHub Spark, an AI-driven application development tool, simplifies the process of turning ideas into applications using natural language, backed by Microsoft and GitHub's extensive resources [2][6][30]. Group 1: GitHub Spark Features - GitHub Spark allows users to create applications from simple text descriptions in under a minute, significantly streamlining the development process [9][14]. - The tool offers UI customization options, enabling users to modify layouts, colors, and even upload visual references for personalized designs [12][13]. - Spark automatically identifies data storage needs and manages cloud storage, addressing common challenges faced by AI development tools [17][29]. Group 2: Integration and Collaboration - Users can connect their Spark applications to GitHub repositories, maintaining all modification records and enabling bidirectional synchronization [24]. - GitHub Copilot assists in generating code, drafting repair suggestions, and creating pull requests, facilitating seamless collaboration among developers [25][30]. Group 3: Pricing and Accessibility - GitHub Spark is available to users subscribed to Copilot Pro+, priced at $39 per month or $390 per year, with additional charges for exceeding message limits [26]. Group 4: Strategic Implications for Microsoft - The launch of GitHub Spark aligns with Microsoft's strategic focus on cloud services and open-source software development, leveraging Microsoft Azure for comprehensive support [28][30]. - By integrating development processes into a single platform, Microsoft aims to enhance its AI ecosystem and retain developers within the GitHub and Azure framework, potentially reaching 1 billion users [30].
训练数据爆减至1/1200!清华&生数发布国产视频具身基座模型,高效泛化复杂物理操作达SOTA水平
量子位· 2025-07-25 05:38
Core Viewpoint - The article discusses the breakthrough of the Vidar model developed by Tsinghua University and Shengshu Technology, which enables robots to learn physical operations through ordinary video, achieving a significant leap from virtual to real-world execution [3][27]. Group 1: Model Development and Capabilities - Vidar utilizes a base model called Vidu, which is pre-trained on internet-scale video data and further trained with millions of heterogeneous robot videos, allowing it to generalize quickly to new robot types with only 20 minutes of real robot data [4][10]. - The model addresses the challenges of data scarcity and the need for extensive multimodal data in current visual-language-action (VLA) models, significantly reducing the data requirements for large-scale generalization [5][6]. - Vidar's architecture includes a video diffusion model that predicts task-specific videos, which are then decoded into robotic arm actions using an inverse dynamics model [7][11]. Group 2: Training Methodology - The embodied pre-training method proposed by the research team integrates a unified observation space, extensive embodied data pre-training, and minimal target robot fine-tuning to achieve precise control in video tasks [10]. - The model's performance was validated through tests on the VBench video generation benchmark, showing significant improvements in subject consistency, background consistency, and imaging quality after embodied data pre-training [11][12]. Group 3: Action Execution and Generalization - The introduction of task-agnostic actions allows for easier data collection and generalization across tasks, eliminating the need for human supervision and annotation [13][15]. - The automated task-agnostic random actions (ATARA) method enables the collection of training data for previously unseen robots in just 10 hours, facilitating full action space generalization [15][18]. - Vidar demonstrated superior success rates in executing 16 common robotic tasks, particularly excelling in generalization to unseen tasks and backgrounds [25][27]. Group 4: Future Implications - The advancements made by Vidar lay a solid technical foundation for future service robots to operate effectively in complex real-world environments such as homes, hospitals, and factories [27]. - The model represents a critical bridge between virtual algorithm training and real-world autonomous actions, enhancing the integration of AI into physical tasks [27][28].