Workflow
SageAttention
icon
Search documents
视频生成DeepSeek时刻!清华&生数开源框架提速200倍,一周斩获2k Star
机器之心· 2025-12-26 04:35
这意味着,AI 视频创作进一步突破了传统的「渲染与等待」模式,来到了向「实时生成」时代转变的关键节点。这项突破迅速引起了学界的广泛关注。 编辑|杜伟 在 2025 年的最后时刻,一个全新视频生成加速框架的开源宣告了:「等待数分钟才能生成一个视频」的时代已经终结! 这个框架正是 清华大学 TSAIL 团队与生数科技联合发布的 TurboDiffusion 。 加速效果有多夸张呢?在几乎不影响生成质量的前提下,主流视频生成模型在单张 RTX 5090 上生成 5 秒 720p 视频的速度可以提升约 200 倍,同时一个 5 秒 480p 视频的生成时长可以被压缩到不到 2 秒(如下动图)。 现在,用户可以体验 TurboDiffusion 支持下的高效文生视频、图生视频的模型版本。 | Model Name | Checkpoint Link | Best Resolution | | --- | --- | --- | | TurboWan2.2-I2V-A14B-720P | Huggingface Model | 720p | | TurboWan2.1-T2V-1.3B-480P | Huggingfac ...
腾讯研究院AI速递 20251226
腾讯研究院· 2025-12-25 16:57
Group 1 - Nvidia has reached a non-exclusive licensing agreement with AI chip startup Groq, reportedly worth $20 billion, acquiring Groq's founder Jonathan Ross and engineering team [1] - Groq focuses on LPU chips for inference, achieving an output speed of 500 tokens per second per card, which is ten times faster than Nvidia's GPUs, utilizing a temporal instruction set architecture to mitigate HBM shortages and reduce costs [1] - This transaction represents a "technology licensing + talent acquisition" model, allowing Groq to continue its cloud business independently while Nvidia aims to enhance its inference computing capabilities targeting the Google TPU market [1] Group 2 - Tsinghua TSAIL Laboratory and Shengshu Technology have jointly open-sourced the TurboDiffusion video generation acceleration framework, reducing the processing time of a 1.3B-480P model on a single RTX 5090 from 184 seconds to 1.9 seconds, achieving a 97-fold acceleration [2] - The framework integrates four core technologies: SageAttention2++ quantization, SLA sparse linear attention, rCM step distillation, and W8A8 quantization, decreasing end-to-end latency from 900 seconds to 8 seconds [2] - SageAttention has been successfully integrated into NVIDIA TensorRT and deployed on platforms such as Huawei Ascend and Moole Technology, with major companies like Tencent, ByteDance, and Alibaba already applying it [2] Group 3 - Shanghai Municipal Planning and Resources Bureau and SenseTime have launched the first 600 billion parameter foundational model in the national planning and resources field, named "Yunyu Xingkong," which can answer questions, adjust maps, perform statistics, recognize images, and generate reports [3] - The model is trained on the Kunyu Jinglue corpus and is integrated with the government intranet's professional version and core business systems, achieving a 98% accuracy rate for specialized terms and a 95% approval rate for human Q&A [3] - It employs a "1+6" (base + vertical) model system and an intelligent scheduling engine, supporting natural language calls for 2D and 3D spatial data, exploring a new paradigm for data productization and service-oriented government models [3] Group 4 - Tencent Cloud and Anhui Yilu Weixing have launched the first AI assistant in the ETC field, named "Assistant Agent," based on Tencent's Mix Yuan model, which has served over one million users since its internal testing began in April [4] - The assistant integrates multimodal interaction technology, supporting both text and voice input, achieving a 95% accuracy rate in Q&A and a 90% problem-solving rate, capable of handling complex requests such as device inquiries, traffic record checks, and invoicing [4] - It deploys 105 state monitoring algorithms to collect real-time device operation data, enabling voice interaction and key status reporting for a "service find person" capability, allowing users to control devices via voice commands [4] Group 5 - Dexmal has proposed the GeoVLA framework, utilizing a dual-stream architecture to retain VLM semantic understanding while endowing robots with 3D geometric perception capabilities through point cloud embedding networks and spatial awareness action experts [6] - In the LIBERO-90 long-range multi-task test, it achieved a 97.7% success rate, surpassing OpenVLA-OFT, and reached an average success rate of 77% in ManiSkill2, with an overall average of 86.3% in real-world tasks [6] - It demonstrated outstanding performance in out-of-distribution scene robustness tests, maintaining a 60% success rate with varying basket heights and a 70% success rate with a 45° viewpoint shift, proving its understanding of true 3D spatial structures [6] Group 6 - The SciMaster team, composed of Shanghai Jiao Tong University's TSAIL Laboratory, Shanghai Algorithm Innovation Research Institute, and DeepSense Technology, has launched ML-Master 2.0, achieving a 56.44% medal rate in the MLE-bench, topping the leaderboard [7] - This system is designed for real machine learning engineering, introducing a hierarchical cognitive caching mechanism that models context as Experience, Knowledge, and Wisdom [7] - It employs a "generate-validate" protocol to achieve ultra-long-range autonomous capabilities, with applications already in theoretical computational physics and embodied intelligence, currently open for Waiting List applications via the SciMaster platform [7] Group 7 - Jim Fan, head of embodied intelligence at Nvidia, stated that Tesla's FSD v14 is the first AI to pass the physical Turing test, with Elon Musk noting that "perception is maturing," and the software has been launched in seven countries including the US [9] - Tesla has established 14 technical barriers, including a sensor freezing scheme for 4-6 years to accumulate data, an instant value judgment engine for intelligent data filtering, and a Neural Codec for processing raw Bayer data [9] - The end-to-end transformer facilitates the transition from photon input to motor torque output, with hardware-in-loop quantization training conducted on the Cortex supercomputer's vehicle chip, updating 12 versions within 77 days, although issues remain with lane switching and lane change decisions [9]
清华SageAttention3,FP4量化5倍加速!且首次支持8比特训练
机器之心· 2025-06-18 09:34
Core Insights - The article discusses the advancements in attention mechanisms for large models, particularly focusing on the introduction of SageAttention3, which offers significant performance improvements over previous versions and competitors [1][2]. Group 1: Introduction and Background - The need for optimizing attention speed has become crucial as the sequence length in large models increases [7]. - Previous versions of SageAttention (V1, V2, V2++) achieved acceleration factors of 2.1, 3, and 3.9 times respectively compared to FlashAttention [2][5]. Group 2: Technical Innovations - SageAttention3 provides a 5x inference acceleration compared to FlashAttention, achieving 1040 TOPS on RTX 5090, outperforming even the more expensive H100 with FlashAttention3 by 1.65 times [2][5]. - The introduction of trainable 8-bit attention (SageBwd) allows for training acceleration while maintaining the same results as full precision attention in various fine-tuning tasks [2][5]. Group 3: Methodology - The research team employed Microscaling FP4 quantization to enhance the precision of FP4 quantization, utilizing NVFP4 format for better accuracy [15][16]. - A two-level quantization approach was proposed to address the narrow range of scaling factors for the P matrix, improving overall precision [15][16]. Group 4: Experimental Results - SageAttention3 demonstrated impressive performance in various models, maintaining end-to-end accuracy in video and image generation tasks [21][22]. - In specific tests, SageAttention3 achieved a 3x acceleration in HunyuanVideo, with significant reductions in processing time across multiple models [33][34].