Workflow
英伟达DGX Station GB300
icon
Search documents
腾讯研究院AI速递 20260320
腾讯研究院· 2026-03-19 16:07
Group 1 - Nvidia DGX Station GB300 has been delivered, featuring 748GB unified memory and a peak performance of 20 petaflops at FP4 precision, supporting trillion-parameter models [1] - The device is designed as a local development platform for long-running autonomous agents, seamlessly expandable with data center architecture, and is accompanied by the NemoClaw open-source software stack for secure operation [1] - The trend of high-performance computing is shifting from cloud to desktop, driven by the evolution of AI agents from experimental prompts to continuously running systems [1] Group 2 - CMU and Princeton have released Mamba-3, achieving an average accuracy of 57.6% at 1.5 billion parameters, surpassing Transformer by 4%, with end-to-end inference latency only one-seventh of that of Transformer [2] - Key improvements include exponential trapezoidal discretization for enhanced memory precision, complex-valued state space to address logical reasoning shortcomings, and MIMO mechanisms to utilize idle GPU power, achieving Mamba-2 performance with half the state size [2] - The team acknowledges that pure SSM underperforms compared to Transformer in retrieval tasks and proposes a 5:1 hybrid architecture solution, aligning with industry trends [2] Group 3 - Xiaomi has launched MiMo-V2-Pro, with over 1 trillion parameters (42 billion active), utilizing a hybrid attention architecture that supports 1 million long contexts, ranking eighth globally and second domestically in the Artificial Analysis comprehensive leaderboard [3] - The model is deeply optimized for agent scenarios, demonstrating superior end-to-end task completion capabilities compared to Claude Sonnet 4.6, with API pricing only one-fifth of Opus 4.6 [3] - Previously launched anonymously as Hunter Alpha on OpenRouter, it achieved over 1 trillion token calls, now collaborating with OpenClaw and Cline to offer limited-time free interfaces [3] Group 4 - Mianbi Intelligence has introduced EdgeClaw Box, an intelligent hardware device featuring the open-source EdgeClaw framework, allowing local deployment of models and agents, with MiniCPM edge models for offline usability and zero token consumption [4] - A core innovation is the self-developed privacy routing middleware, which categorizes data processing by sensitivity levels: default cloud, desensitized cloud, and mandatory local, preventing privacy data leakage through dual-track memory mechanisms [4] - The product is positioned as foundational infrastructure for digital companies in the OPC community, compatible with mainstream hardware like Nvidia DGX Spark and Mac Mini, and is available for pre-sale in the enterprise version [4] Group 5 - Jieyue AI has released StepClaw for desktop, optimized for OpenClaw, supporting both Windows and Mac without the need for servers or command lines, thus lowering the barrier for agent usage [5] - It connects to an ecosystem with over 5,000 creators and applications, supporting five asset types including skills, plugins, and triggers, enabling agents to autonomously identify and address capability gaps [6] - Security features include dual review of application assets, local data storage, and pre-installed universal security configurations, along with personalized avatar customization [6] Group 6 - QQ Browser has introduced an AI PPT feature, allowing users to generate structured PPTs with a single click, extracting core information from Word and PDF documents without switching tools [7] - The feature supports building report frameworks from scratch and automatically generates charts, matches images, and standardizes layouts [7] - It covers various scenarios including work reports, event planning, financial analysis, and job self-introductions, ensuring seamless workflow from documents to presentations [7] Group 7 - Midjourney V8 Alpha has been launched, featuring core upgrades such as native 2K rendering, approximately five times faster generation speed, and enhanced text rendering capabilities, with a focus on personalized control features [8] - V8 represents a workflow reconstruction rather than a smooth upgrade from V7, requiring old users to adapt to a new control logic, which may lead to short-term workflow challenges [8] - This shift indicates a competitive landscape in AI image tools moving from single-image output quality to style stability and workflow continuity, expanding the target market from inspiration images to brand visuals and serialized commercial production [8] Group 8 - In a GTC summit dialogue, Jeff Dean and Bill Dally emphasized that inference has overtaken training as the main focus, with 90% of data center power consumption dedicated to inference [9] - Nvidia aims to compress latency to physical limits through redesigning on-chip and off-chip communication architectures, targeting a performance of tens of thousands of tokens per user [9] - Dean predicts a rewrite of the pre-training paradigm, suggesting future models should actively learn and act in environments rather than passively observing data streams, blurring the lines between pre-training and post-training [9]