Core Viewpoint - AI live streaming has evolved from a gimmick to a viable business model, showcasing the effectiveness of AI-generated digital hosts in engaging audiences and driving sales [2][5][24]. Group 1: AI Live Streaming Performance - During the 618 shopping festival, an AI live stream featuring digital personas of Luo Yonghao and Zhu Xiaomu attracted over 13 million viewers and generated a GMV exceeding 55 million yuan, outperforming Luo's previous live stream debut [3][5]. - The digital hosts demonstrated a high level of interaction and humor, effectively engaging with the audience, which surprised even the original host, Luo Yonghao [4][5]. Group 2: Technology Behind Digital Hosts - The digital personas were created using Baidu's multi-modal collaborative digital human technology, which integrates script-driven multi-modal collaboration, dynamic decision-making for real-time interaction, and high-fidelity long video generation [6][7]. - The core of this technology is script generation, which includes dialogue, multi-modal driving, and dynamic interaction, ensuring that the digital hosts' personalities and styles are accurately represented [10][12]. Group 3: Script Generation and Interaction - The script generation process addresses three key issues: style modeling for diverse dialogue, character modeling for realistic personas, and content planning to ensure accuracy and engagement [12]. - Multi-modal driving allows the language model to generate dialogue while simultaneously producing visual and audio cues, enhancing the synchronization of speech and actions [13]. Group 4: Voice Synthesis and Emotional Expression - Baidu's "text-controlled voice synthesis" approach ensures that the generated speech reflects emotional nuances and natural rhythm, making the digital hosts more relatable and engaging [16]. - The technology also addresses the challenges of dual-host interactions, ensuring seamless transitions and natural dialogue flow between digital personas [16]. Group 5: High Fidelity Video Generation - The technology for generating high-fidelity digital humans focuses on achieving consistency across audio, visual, and dialogue elements, which is crucial for maintaining viewer immersion during long live streams [18][20]. - Baidu's approach includes modeling character and product interactions independently to ensure accurate and responsive engagement throughout the live stream [20]. Group 6: Future Implications - Baidu's early investment in AI technology has positioned it as a leader in the field, with continuous advancements in its large model capabilities, enhancing the realism and intelligence of digital hosts [22][24]. - The success of the Luo Yonghao digital live stream exemplifies the practical application of Baidu's technology in commercial settings, indicating potential for further exploration of innovative business models [24].
老罗数字人刷屏背后,AI导演正偷偷改写直播「剧本」