Workflow
实时控制
icon
Search documents
单张4090跑到30fps,范浩强团队让VLA实时跑起来了
机器之心· 2025-10-31 07:57
Core Insights - The RT-VLA paper reveals that the VLA model can achieve real-time performance, specifically reaching up to 30 frames per second (fps) on a consumer-grade RTX 4090 GPU with a 3 billion parameter model [2][6] - The researchers have optimized the model's structure, reducing inference time from over 100 milliseconds to as low as 27 milliseconds for dual-view scenarios, significantly outperforming previous results [2][6] - A new algorithm framework has been designed to potentially achieve 480Hz closed-loop control, enabling real-time operation of VLA models [3][12] Model Optimization - The Pi0 model consists of a visual encoder, an encoder, and a decoder, which can be broken down into numerous matrix multiplications and scalar operations [8] - The optimization process involved analyzing the model's inference steps, merging and parallelizing calculations to eliminate bottlenecks, resulting in a streamlined inference time [8][10] - The outcome is a high-performance AI model capable of real-time tasks, likened to a "flash" in terms of speed [8][10] Performance Demonstration - A specific task demonstrated the model's capability to react to a falling pen, achieving an end-to-end response time of under 200 milliseconds, comparable to human performance [10][12] - The framework allows for streaming real-time control of robots, with plans to generate control signals at a maximum frequency of 480Hz [12][15] Future Prospects - The research opens the door to a world where VLA models can participate in real-time control, with potential advancements in edge computing capabilities [14] - Future developments may explore increasing the speed of visual processing beyond 30fps and expanding model sizes while maintaining real-time constraints [15]