Workflow
存算分离架构
icon
Search documents
【今跃教育】vivo 海量数据场景下的消息系统架构演进
Sou Hu Cai Jing· 2025-10-10 21:42
Group 1: Core Insights - Vivo's mobile internet business serves over 400 million users with applications, short videos, and advertising, processing daily data volumes in the range of hundreds of billions [1] - The transition from Kafka to Apache Pulsar addresses scalability and performance issues, enabling effective management of massive data traffic and improving operational efficiency [3][4] Group 2: Business Challenges - Vivo's original Kafka-based messaging system faced limitations due to increasing topic and partition numbers, leading to performance degradation and high operational costs [3] - The inability of Kafka to dynamically scale and the reliance on partition numbers for performance created significant challenges during traffic spikes [3] Group 3: Technical Selection - Apache Pulsar was chosen for its advantages, including a stateless broker architecture that supports rapid scaling and a unique bundle mechanism that manages large numbers of topics effectively [4] - Pulsar's support for multiple consumption modes enhances its ability to handle varying traffic demands and ensures message order [4] Group 4: Implementation and Optimization - Vivo optimized Pulsar's bundle management and data retention strategies, improving data distribution and monitoring capabilities [5][6] - Adjustments to load balancing and client performance parameters significantly enhanced the system's ability to handle high message volumes [6] Group 5: Didi's Big Data Operations - Didi's big data team adopted Apache Pulsar in 2021, replacing the DKafka system and resolving long-standing operational challenges [7][9] - The transition to Pulsar improved performance, cost efficiency, and reliability, addressing issues such as disk I/O bottlenecks and complex load balancing [8][9] Group 6: Didi's Implementation and Optimization - Didi optimized hardware configurations and utilized Pulsar's ensemble mechanism to ensure balanced data distribution and efficient resource utilization [10] - The system's design allows for quick scaling and fault recovery, ensuring continuous service during peak loads and failures [10][12]