豆包语音合成模型2.0
Search documents
19亿次互动背后:AI如何成为春晚“新主角”?
Xin Lang Cai Jing· 2026-02-18 13:07
Core Insights - The 2023 Spring Festival Gala showcased significant advancements in AI technology, particularly in content creation and audience interaction, marking a shift from traditional methods to AI-driven experiences [1][3][19] Group 1: AI in Content Creation - The gala featured AI-generated visuals that enhanced traditional art forms, such as the animated representation of Xu Beihong's "Six Horses" painting, which maintained the essence of Chinese ink painting while adding dynamic movement [5][7] - ByteDance's Seedance 2.0 model successfully interpreted and rendered complex artistic elements, allowing for intricate details and movements in the performances, demonstrating a leap in AI's ability to handle cultural nuances [7][9] - The use of spatial video technology enabled real-time rendering of multiple digital avatars of performer Liu Haocun, showcasing the potential of AI in creating immersive experiences [11][9] Group 2: Audience Interaction - The interactive component of the gala shifted from traditional methods like red envelope giveaways to AI-driven experiences, where users could generate personalized avatars and festive messages through the Doubao app [2][12] - On New Year's Eve, Doubao AI interactions reached 1.9 billion, with over 50 million themed avatars and 100 million festive messages generated, indicating a significant integration of generative AI into everyday life [2][15] - The transition from fixed content to real-time AI generation represents a fundamental change in user engagement, moving from passive consumption to active participation [14][15] Group 3: Accessibility and Inclusivity - The introduction of real-time subtitles during the gala improved accessibility for hearing-impaired audiences, utilizing advanced speech recognition technology to ensure accurate and timely captioning [16][18] - The Bumi robot's conversational capabilities, enhanced by AI voice synthesis, provided a more engaging interaction with performers, showcasing the potential for AI to create emotionally resonant experiences [18][16] Group 4: Technological Infrastructure - ByteDance's Ark platform managed the substantial computational demands of the AI interactions, employing techniques like cross-data center scheduling and distributed caching to ensure smooth operation during peak usage [19][15] - The gala's success illustrates the growing role of AI as a catalyst for new cultural practices, blending traditional customs with modern technology to create unique experiences [19][3]
春晚张杰《驭风歌》背后的马,是Seedance 2.0做的!
量子位· 2026-02-17 03:58
Core Viewpoint - The article highlights the significant advancements in AI technology showcased during the Spring Festival Gala, particularly focusing on the capabilities of the Seedance 2.0 model and its integration with various AI applications in performance and interaction [2][42]. Group 1: AI Technology in Performance - The performance of "Yufeng Song" by Zhang Jie featured a background video created using the Seedance 2.0 model, which successfully interpreted and animated traditional Chinese ink painting styles, a task that many foreign models struggled with [4][5]. - Seedance 2.0 was utilized in multiple performances, including the creative dance show "He Huashen," where it demonstrated micro-control capabilities to create detailed visual effects [7][10]. - The model's ability to follow physical and biomechanical principles allowed for realistic animations of galloping horses, showcasing its advanced command-following and multi-modal material reference capabilities [8][10]. Group 2: Video Quality Enhancement - The collaboration with the Volcano Engine video cloud team enabled the enhancement of video quality to meet the Spring Festival Gala's high standards, utilizing super-resolution algorithms to upscale 720P to 8K and frame interpolation to increase frame rates from 24 to 50 FPS [15][17]. - The integration of 4D Gaussian splashing technology allowed for the creation of immersive visual experiences, where virtual dancers interacted seamlessly with real stage lighting [20][22]. Group 3: AI Interaction and User Engagement - The Spring Festival Gala introduced AI-driven interactive features through the Doubao app, allowing users to generate personalized avatars and greetings, marking a shift from traditional transactional interactions to more complex, computationally intensive engagements [28][30]. - The Ark platform played a crucial role in managing the high traffic during the event, utilizing a federated system to optimize resource allocation and ensure rapid response times for user requests [31][29]. Group 4: Broader Implications and Industry Impact - The article emphasizes the widespread adoption of Doubao's AI models across various industries, including automotive, mobile, and robotics, highlighting its robust partnerships with major companies [40][41]. - The successful implementation of AI technologies during the Spring Festival Gala serves as a demonstration of their practical value and potential for real-world applications, reinforcing the notion that effective AI solutions can deliver tangible benefits [43][44].
火山引擎升级豆包系列模型
Ke Ji Ri Bao· 2025-10-20 23:28
Core Insights - Volcano Engine has released a series of updates for the Doubao large model, including Doubao 1.6, which natively supports multiple thinking lengths, and new models such as Doubao Voice Synthesis Model 2.0 and Doubao Voice Replication Model 2.0 [1] Group 1: Model Updates - Doubao 1.6 introduces four thinking lengths (minimum, low, medium, high) to balance model performance, latency, and cost for enterprises, making it the first model in China to support "tiered thinking length adjustment" natively [2] - Doubao 1.6lite is a lighter version of the flagship model, offering faster inference speed and a 53.3% reduction in overall usage costs compared to Doubao 1.5pro in the most commonly used input range of 0-32k [2] - The Smart Model Router, a solution for intelligent model selection, has been launched, allowing automatic selection of the most suitable model for task requests, optimizing both performance and cost [2] Group 2: Market Performance - As of the end of September, the daily token usage for Doubao has exceeded 30 trillion, representing an over 80% increase since the end of May [1] - According to IDC, Volcano Engine holds a 49.2% market share in China's public cloud large model service market, ranking first [1]
火山引擎:日均tokens超30万亿
Bei Jing Shang Bao· 2025-10-16 13:48
Core Insights - Volcano Engine has released a series of updates for the Doubao large model, including Doubao model 1.6, which natively supports multiple thinking lengths, and new models such as Doubao model 1.6 lite, Doubao voice synthesis model 2.0, and Doubao voice replication model 2.0 [1] Summary by Categories - **Product Updates** - Doubao model 1.6 introduces native support for various thinking lengths [1] - New models launched include Doubao model 1.6 lite, Doubao voice synthesis model 2.0, and Doubao voice replication model 2.0 [1] - **Performance Metrics** - As of September 30, 2025, the daily average token usage for the Doubao large model exceeds 30 trillion, representing an increase of over 80% compared to the end of May [1]
火山引擎发布豆包系列模型升级,披露日均tokens超30万亿
2 1 Shi Ji Jing Ji Bao Dao· 2025-10-16 10:01
Core Insights - Volcano Engine has released a series of updates for the Doubao large model, including Doubao 1.6, which natively supports multiple thinking lengths, and introduced Doubao 1.6 lite, Doubao Speech Synthesis Model 2.0, and Doubao Voice Replication Model 2.0 [1][2] Model Updates - Doubao 1.6 is the first large model in China to support "tiered adjustment of thinking length," offering four options: Minimal, Low, Medium, and High, which balance model performance, latency, and cost [3] - The upgraded Doubao 1.6 model shows a 77.5% reduction in total output tokens and an 84.6% decrease in thinking time at low thinking length, while maintaining model effectiveness [3] - Doubao 1.6 lite is lighter and faster than the flagship version, outperforming Doubao 1.5 pro by 14% in enterprise-level assessments and reducing overall usage costs by 53.3% in the most commonly used input range of 0-32k [3] Speech Models - The newly released Doubao Speech Synthesis Model 2.0 and Doubao Voice Replication Model 2.0 feature enhanced emotional expressiveness and precise instruction adherence, capable of accurately reading complex formulas [8] - These models have achieved a 90% accuracy rate in reading complex formulas for subjects from elementary to high school, addressing a significant challenge in the industry [8] Intelligent Model Routing - Volcano Engine has introduced the Smart Model Router, the first intelligent model selection solution in China, allowing users to choose from "Balanced Mode," "Effect Priority Mode," and "Cost Priority Mode" for optimal model selection based on task requests [10] - In tests, the Smart Model Router improved the effectiveness of the DeepSeek model by 14% in Effect Priority Mode and reduced overall costs by over 70% in Cost Priority Mode while achieving similar results [10] Market Position - As of September 2025, the daily token usage of Doubao large model has exceeded 30 trillion, representing an over 80% increase since May 2023 [1] - Volcano Engine holds a 49.2% market share in China's public cloud large model service market, ranking first according to IDC [1]
新豆包模型让郭德纲喊出发疯文学:(这班)不上了!不上了!不上了!!!
量子位· 2025-10-16 06:11
Core Viewpoint - The article discusses the advancements in AI voice technology by Huoshan Engine, particularly focusing on the upgrades to the Doubao voice synthesis and voice replication models, which enhance emotional expression and contextual understanding in AI-generated speech [5][11][41]. Group 1: AI Voice Technology Upgrades - Huoshan Engine has upgraded its Doubao voice synthesis model to version 2.0, which allows for better emotional expression and understanding of dialogue [7][11]. - The upgrade includes two main models: Doubao voice synthesis model 2.0 and Doubao voice replication model 2.0, enabling AI to replicate voices and understand emotional nuances [7][8]. - The new models can interpret user instructions regarding emotions, dialects, tones, and speech rates, significantly improving the quality of AI-generated speech [12][21]. Group 2: Contextual Understanding and Emotional Expression - The models can now incorporate context from previous dialogue, enhancing the coherence and emotional depth of the generated speech [12][23]. - The ability to accurately read complex formulas has improved, with the Doubao model achieving around 90% accuracy in reading complex formulas for school subjects, compared to less than 50% for similar models [24][25]. - The advancements allow for a more human-like interaction, moving from merely sounding human to truly understanding human emotions and context [11][41]. Group 3: Technological Innovations and Applications - The Doubao large model 1.6 has been upgraded to support adjustable thinking lengths, allowing users to balance effectiveness, latency, and cost [30][33]. - Huoshan Engine has introduced a Smart Model Router, which optimally matches user tasks with the most suitable models, significantly reducing costs by up to 71% in cost-prioritized modes [39][41]. - The technology has been applied in various commercial scenarios, enhancing user experiences in products from companies like Xiaomi and OPPO, and improving complex demand responses in platforms like Dongchedi [45][46]. Group 4: Growth and Infrastructure - The daily token usage of the Doubao large model has surged from 120 billion to over 30 trillion, marking a 253-fold increase in just over a year [47][48]. - This growth is supported by Huoshan Engine's robust AI cloud infrastructure, which provides the necessary computational power and high-quality data for model training and inference [48].