Workflow
混合模型
icon
Search documents
上海AI Lab发布混合扩散语言模型SDAR:首个突破6600 tgs的开源扩散语言模型
机器之心· 2025-11-01 04:22
近日,上海人工智能实验室针对该难题提出全新范式 SDAR (Synergistic Diffusion-AutoRegression)。 程爽,上海人工智能实验室和浙江大学联培博士生一年级;卞一涵,美国马里兰大学硕士生二年级,上海人工智能实验室实习生;刘大卫,上海人工智能实验室 和上海交通大学联培博士生一年级 ;齐弼卿,上海人工智能实验室研究员(指导老师) 大模型推理速度慢、成本高,已成为限制其广泛应用的核心 瓶颈 。其根源在于自回归(AR)模型「逐字生成」 的串行模式。 该方法通过 「训练-推理解耦」的巧妙设计,无缝融合了 AR 模型 的高性能与扩散模型的并行推理优势,能以极低成本将任意 AR 模型 「改造」为并行解码模型。 论文标题:SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation 实验证明, SDAR 不仅 在多个基准上与原版 AR 模型 性能持平甚至超越,还能带来数倍的真实推理加速。更令人惊喜的是, SDAR 在复杂的科学推理任务上展现 出巨大潜力。在与采用相同配置训练的 AR ...
实测低调上线的DeepSeek新模型:编程比Claude 4还能打,写作...还是算了吧
3 6 Ke· 2025-08-20 12:14
Core Insights - DeepSeek has officially launched and open-sourced its new model, DeepSeek-V3.1-Base, following the release of GPT-5, despite not having released R2 yet [1] - The new model features 685 billion parameters and supports multiple tensor types, with significant optimizations in inference efficiency and an expanded context window of 128k [1] Model Performance - Initial tests show that DeepSeek V3.1 achieved a score of 71.6% on the Aider Polyglot programming benchmark, outperforming other open-source models, including Claude 4 Opus [5] - The model successfully processed a long text and provided relevant literary recommendations, demonstrating its capability in handling complex queries [4] - In programming tasks, DeepSeek V3.1 generated code that effectively handled collision detection and included realistic physical properties, showcasing its advanced programming capabilities [8] Community and Market Response - Hugging Face CEO Clément Delangue noted that DeepSeek V3.1 quickly climbed to the fourth position on the trends chart, later reaching second place, indicating strong market interest [79] - The update removed the "R1" label from the deep thinking mode and introduced native "search token" support, enhancing the search functionality [79][80] Future Developments - The company plans to discontinue the mixed thinking mode in favor of training separate Instruct and Thinking models to ensure higher quality outputs [80] - As of the latest update, the model card for DeepSeek-V3.1-Base has not yet been released, but further technical details are anticipated [81]