Mercury
Search documents
用更一致的轨迹、更少的解码步数「驯服」掩码扩散语言模型,扩散语言模型的推理性能和效率大幅提升
机器之心· 2025-11-05 04:15
扩散大语言模型得到了突飞猛进的发展,早在 25 年 2 月 Inception Labs 推出 Mercury—— 第一个商业级扩散 大型语言模型,同期人民大学发布第一个开源 8B 扩散大语言模型 LLaDA,5 月份 Gemini Diffusion 也接踵 而至。种种迹象表明,扩散大语言模型很可能是下一代大语言模型基础范式的有力竞争者。但是针对于扩 散大语言模型的解码策略和强化学习算法仍然是欠探索的。 近期,复旦大学、上海人工智能实验室、上海交通大学联合研究团队发布最新论文《Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step》。 他们提出了一套对于掩码扩散大语言模型(Masked Diffusion Large Language Model,MDLM)的 高效解码 策略 + 强化学习训练组合 ,显著提升了掩码扩散大语言模型的 推理性能与效率 ,为扩散大语言模型的发展 开辟了新路径。 代码仓库:https://github.com/ ...
X @IcoBeast.eth🦇🔊
IcoBeast.eth🦇🔊· 2025-08-29 14:11
Integrations & Partnerships - Mercury App now features full native integration of Project X, enabling users to LP (liquidity provide) and potentially other features on the go [1][2][3] - Mercury has integrated Hypercore and HyperEVM into a single application [2] - Mercury is pulling in many partner integrations, suggesting a wide range of features and options for users [2] User Experience & Features - Mercury offers a casual-user friendly experience with simple higher/lower trading choices [2] - Deposits are easily managed via card or crypto [2] - Perps (perpetual futures) integration is noted as "slick" [1]
扩散语言模型写代码!速度比自回归快10倍
量子位· 2025-07-10 03:19
Core Viewpoint - The article discusses the launch of Mercury, a new commercial-grade large language model based on diffusion technology, which can generate code at a significantly faster rate than traditional models. Group 1: Model Innovation - Mercury breaks the limitations of autoregressive models by predicting all tokens at once, enhancing generation speed [2] - The model allows for dynamic error correction during the generation process, providing greater flexibility compared to traditional models [4][20] - Despite using diffusion technology, Mercury retains the Transformer architecture, enabling the reuse of efficient training and inference optimization techniques [6][7] Group 2: Performance Metrics - Mercury's code generation speed can be up to 10 times faster than traditional tools, significantly reducing development cycles [8] - On H100 GPUs, Mercury achieves a throughput of 1109 tokens per second, showcasing its efficient use of hardware [9][13] - In benchmark tests, Mercury Coder Mini and Small achieved response times of 0.25 seconds and 0.31 seconds, respectively, outperforming many competitors [16] Group 3: Error Correction and Flexibility - The model incorporates a real-time error correction module that detects and corrects logical flaws in code during the denoising steps [21] - Mercury integrates abstract syntax trees (AST) from programming languages like Python and Java to minimize syntax errors [22] Group 4: Development Team - Inception Labs, the developer of Mercury, consists of a team of experts from prestigious institutions, including Stanford and UCLA, with a focus on improving model performance using diffusion technology [29][34]
多模态扩散模型开始爆发,这次是高速可控还能学习推理的LaViDa
机器之心· 2025-05-30 04:16
Core Viewpoint - The article introduces LaViDa, a large vision-language diffusion model that combines the advantages of diffusion models with the ability to process both visual and textual information effectively [1][5]. Group 1: Model Overview - LaViDa is a vision-language model that inherits the high speed and controllability of diffusion language models, achieving impressive performance in experiments [1][5]. - Unlike autoregressive large language models (LLMs), diffusion models treat text generation as a diffusion process over discrete tokens, allowing for better handling of tasks requiring bidirectional context [2][3][4]. Group 2: Technical Architecture - LaViDa consists of a visual encoder and a diffusion language model, connected through a multi-layer perceptron (MLP) projection network [10]. - The visual encoder processes multiple views of an input image, generating a total of 3645 embeddings, which are then reduced to 980 through average pooling for training efficiency [12][13]. Group 3: Training Methodology - The training process involves a two-stage approach: pre-training to align visual embeddings with the diffusion language model's latent space, followed by end-to-end fine-tuning for instruction adherence [19]. - A third training phase using distilled samples was conducted to enhance the reasoning capabilities of LaViDa, resulting in a model named LaViDa-Reason [25]. Group 4: Experimental Performance - LaViDa demonstrates competitive performance across various visual-language tasks, achieving the highest score of 43.3 on the MMMU benchmark and excelling in reasoning tasks [20][22]. - In scientific tasks, LaViDa scored 81.4 and 80.2 on ScienceQA, showcasing its strong capabilities in complex reasoning [23]. Group 5: Text Completion and Flexibility - LaViDa provides strong controllability for text generation, particularly in text completion tasks, allowing for flexible token replacement based on masked inputs [28][30]. - The model can dynamically adjust the number of tokens generated, successfully completing tasks that require specific constraints, unlike autoregressive models [31][32]. Group 6: Speed and Quality Trade-offs - LaViDa allows users to balance speed and quality by adjusting the number of diffusion steps, demonstrating flexibility in performance based on application needs [33][35]. - Performance evaluations indicate that LaViDa can outperform autoregressive baselines in speed and quality under certain configurations, highlighting its adaptability [35].