Workflow
数字人技术
icon
Search documents
能发福袋、能玩梗、能分析用户历史行为 百度发布新一代数字人技术
Group 1 - The core point of the article is the announcement of Baidu's new generation digital human technology, NOVA, which marks the entry into a scalable production era for super head anchor capabilities [1] - NOVA technology previously supported a digital human live stream by Luo Yonghao, achieving a GMV of 55 million [1] - The technology is expected to be open to the entire industry in October, allowing ordinary users to access professional live streaming capabilities comparable to top anchors [1] Group 2 - The new generation NOVA technology features six major capability upgrades, including script modeling, action generation, voice cloning, script writing, Q&A capabilities, and interactive gameplay [2] - The technology enables the first-ever "dual digital anchor" setup, allowing two digital humans to collaborate seamlessly during live streaming [2] - With the support of Baidu's upgraded Wenxin 4.5T technology, digital humans can now "understand creation," enhancing product expertise during live streams and allowing for personalized interactions [2] Group 3 - NOVA technology generates scripts that align with character settings, which are then performed by digital humans [5] - Digital humans can actively engage with users, facilitating high-frequency interactions similar to those in human-led live streams, including activities like lotteries and red envelope giveaways [5] - The AI brain can respond to user inquiries in real-time, adjusting video content to meet user demands and proactively guiding user interactions based on historical behavior [5]
如何做到在手机上实时跑3D真人数字人?MNN-TaoAvatar开源了!
机器之心· 2025-06-25 00:46
Core Viewpoint - TaoAvatar is a breakthrough 3D digital human technology developed by Alibaba's Taobao Meta Technology team, enabling real-time rendering and AI dialogue on mobile and XR devices, providing users with a realistic virtual interaction experience [1][8]. Group 1: Technology Overview - TaoAvatar utilizes advanced 3D Gaussian splatting technology to create lifelike full-body avatars that capture intricate facial expressions and gestures, as well as details like clothing folds and hair movement [8]. - The technology significantly reduces the cost and increases the efficiency of digital human modeling, facilitating large-scale applications [9]. - MNN-TaoAvatar is an open-source 3D digital human application that integrates multiple leading AI technologies, allowing natural voice interaction with digital humans on mobile devices [10]. Group 2: Performance Metrics - The application runs efficiently on mobile devices, with key performance metrics for various models as follows: - ASR (Automatic Speech Recognition): Model size 281.65M, RTF: 0.18 - LLM (Large Language Model): Model size 838.74M, pre-fill speed: 165 tokens/s, decode speed: 41.16 tokens/s - TTS (Text-to-Speech): Model size 1.34GB, RTF: 0.58 - A2BS (Audio-to-BlendShape): Model size 368.71MB, RTF: 0.34 - NINIR (Rendering Output): Model size 138.40MB, rendering frame rate: 60 FPS [16][17][18]. Group 3: Development and Optimization - MNN-TaoAvatar is built on the MNN engine, which supports various algorithm modules, enhancing the performance of AI applications in real-time scenarios [23][30]. - The MNN-LLM module demonstrates superior CPU performance, with pre-fill speed improved by 8.6 times compared to llama.cpp and decoding speed improved by 2.3 times [34]. - The MNN-NNR rendering engine employs optimizations such as data synchronization and scheduling to ensure efficient rendering, achieving smooth output at 60 FPS even with lower frequency updates [40][45]. Group 4: Hardware Requirements - Recommended hardware for MNN-TaoAvatar includes devices with Qualcomm Snapdragon 8 Gen 3 or equivalent CPU, at least 8GB of RAM, and 5GB of storage for model files [51].
真·罗永浩直播干不过假·罗永浩?网友:不是老罗在演AI吧?
量子位· 2025-06-18 07:49
Core Viewpoint - The article discusses the successful debut of a digital avatar of Luo Yonghao during the 618 shopping festival, which achieved record-breaking sales and viewer engagement, showcasing advancements in digital human technology by Baidu [1][7][11]. Group 1: Digital Human Performance - Luo Yonghao's digital avatar achieved over 13 million views and generated a GMV (Gross Merchandise Volume) exceeding 55 million yuan during the live stream [7]. - The digital avatar surpassed the sales performance of Luo Yonghao's real persona during his previous live stream in May, indicating significant improvements in digital human capabilities [7][28]. - The interaction between the digital avatars of Luo Yonghao and Zhu Xiaomu was seamless, mimicking real human interaction effectively [2][15]. Group 2: Technological Advancements - Baidu's digital human technology, known as Huibo Star, incorporates a high-persuasion digital human model that combines image, perception, decision-making, and action [11][30]. - The introduction of the first dual digital human interactive live streaming room allows for natural dialogue and interaction, enhancing viewer experience [13][14]. - The technology utilizes advanced language models and multi-modal systems to create dynamic interactions, making the digital human appear more lifelike and engaging [30][31]. Group 3: Market Impact and Accessibility - The advancements in digital human technology have lowered the entry barrier for new streamers, enabling even those without prior experience to utilize digital avatars for live streaming [50]. - Small and medium-sized businesses have reported significant increases in order volumes by adopting digital human technology for continuous live streaming [51][54]. - Baidu's initiatives, such as the Dream Butterfly and Starry Sky plans, aim to support merchants by increasing the number of digital human avatars and providing financial backing for their use [59][60]. Group 4: Broader Industry Implications - The use of digital humans is expanding across various sectors, with over 100,000 businesses leveraging this technology, resulting in an average GMV increase of 62% and an 80% reduction in operational costs [58]. - The article emphasizes that digital human technology is not exclusive to top streamers but represents a new form of shared productivity accessible to a wider audience [61].
罗永浩数字人直播背后,也许是一种新商业模式的开始
Sou Hu Cai Jing· 2025-06-18 07:45
Core Viewpoint - The emergence of advanced digital human technology may signify the birth of a new business model, potentially termed "IPaaS" (IP as a Service), which could revolutionize how intellectual property (IP) is utilized in commercial activities [5][21]. Group 1: Digital Human Technology - The recent live stream by Luo Yonghao showcased significant advancements in digital human technology, surprising many viewers with its realism and engagement [4][21]. - Previous attempts at using digital humans in live streams faced criticism due to their lack of realism and emotional engagement, highlighting the rapid technological improvements over the past year [3][21]. Group 2: Business Model Transformation - The concept of IPaaS suggests a shift from traditional IP management to a service-oriented model, where digital humans could handle commercial activities, allowing real IP creators to focus on content creation [5][25]. - The transition from purchasing software to renting it (SaaS) serves as a parallel to the potential evolution of IP management, where digital representations could reduce costs and increase efficiency [13][14]. Group 3: Implications for IP Management - The limitations of real human IPs, such as time constraints and physical presence, could be mitigated by digital humans, which can operate continuously without health issues [15][20]. - The potential for digital humans to replicate the persona of real IPs raises questions about trust and authenticity, which are crucial for maintaining audience engagement [27][30]. Group 4: Future Considerations - The proliferation of digital humans could lead to market saturation, potentially diminishing their novelty and effectiveness in engaging audiences [28]. - Concerns about the misuse of digital human technology for misinformation or illegal activities highlight the need for regulatory frameworks to manage this emerging landscape [30].
解密数字老罗带货“账本”
Bei Jing Shang Bao· 2025-06-17 14:34
Core Insights - The live streaming event featuring AI-generated digital human "Lao Ro" achieved a GMV (Gross Merchandise Volume) exceeding 55 million yuan, with over 1.3 million viewers and a 150% increase in order volume compared to traditional live streams [2][3][4] Group 1: Digital Human Technology - The digital human technology has evolved from 1.0 to 3.0 stages, with the latest version achieving high levels of realism and interactivity, allowing for dynamic decision-making and real-time interaction [5][6] - The production process involves training the digital human using vast amounts of data from real individuals, ensuring a natural and engaging presentation style [5][6] - Baidu's digital human technology is based on advanced multimodal models, enabling the generation of scripts that drive the digital human's interactions [6] Group 2: Market Potential and Cost - The overall market for digital humans is projected to exceed 100 billion yuan by 2026, indicating significant growth potential in various sectors [9] - The cost of implementing digital human technology is currently around 1,000 yuan per month, with expectations for further cost reductions as the technology matures [8] - Digital humans are seen as complementary to real human hosts, enhancing efficiency in content creation and live streaming operations [9] Group 3: User Engagement and Interaction - The digital human demonstrated high user engagement, with a 3-fold increase in interaction rates and over 30% viewer retention during the live stream [3][4] - The ability to respond to trending topics and engage with viewers in real-time was highlighted as a key feature of the digital human's performance [4] - The integration of humor and relatable content contributed to a more engaging viewer experience, showcasing the potential for digital humans to connect with audiences [4][5]
闪电快讯|官宣罗永浩为首席体验官,百度电商官宣两大计划培育数字主播生态
Xin Lang Cai Jing· 2025-06-17 09:46
Group 1 - The core event is the launch of "Digital Person Luo Yonghao" and "Digital Person Zhu Xiaomu" in a live streaming session, achieving a GMV of over 50 million yuan [1] - The live streaming session lasted approximately 7 hours, with over 1.3 million viewers, showcasing the effectiveness of digital humans in e-commerce [1] - The technology behind the digital humans is developed by Baidu, utilizing multi-modal collaborative digital human technology, which allows for real-time interaction and dynamic decision-making [1] Group 2 - Luo Yonghao has been appointed as the Chief Experience Officer of Baidu Huibo Star and will participate in live streaming sessions combining real and digital personas [2] - Baidu plans to implement two major initiatives: the Dream Butterfly plan to increase the number of top digital hosts and the Starry Sky plan to provide 100,000 free digital humans and 100 million yuan in subsidies to support ordinary individuals and small businesses [4]
硅基智能砸1000万美元换DUIX域名,贵且难记或注定翻车?
3 6 Ke· 2025-05-12 04:54
Core Viewpoint - The acquisition of the domain DUIX.com for $10 million by Silicon-based Intelligence raises questions about the effectiveness of this investment in enhancing brand value and market recognition, given the potential drawbacks of the domain name itself [1][3][17]. Group 1: Company Background - Silicon-based Intelligence, established in 2017, is a well-known player in the AI digital human technology sector, focusing on the development and application of digital human technology [4]. - The company has made significant strides in natural language processing, multimodal interaction, knowledge graphs, and emotional computing, with previous successes in smart customer service and virtual assistants [6]. Group 2: DUIX Platform Overview - The DUIX platform, short for Dialogue User Interface System, is positioned as the next-generation infrastructure for intelligent digital human interaction, supporting various input methods and applications such as emotional companionship, smart customer service, education, healthcare, and gaming [6]. - Key features of DUIX include ultra-low latency interaction capabilities and highly realistic digital human performance, enhancing user experiences across different scenarios [6]. Group 3: Domain Acquisition Analysis - The purchase of DUIX.com reflects the recognition of the domain's scarcity, as there are only about 450,000 four-letter .com domains globally, making such domains highly sought after as digital assets [7]. - Despite the high acquisition cost, the actual commercial potential of DUIX.com needs to be evaluated from the perspectives of brand communication and user experience, as scarcity does not equate to brand value [7][12]. Group 4: Brand Name Challenges - The name DUIX presents challenges in memorability and pronunciation, particularly for non-native English speakers, which could hinder user engagement and brand recognition [8][9]. - The name does not intuitively convey the core business concepts of "digital human," "intelligence," or "interaction," leading to a higher cognitive load for users trying to understand the service [9]. Group 5: Cost-Benefit Considerations - The $10 million investment in the domain is substantial for an AI startup, and its return hinges on whether it can drive user traffic and market acceptance [11]. - A difficult-to-spell domain like DUIX may create barriers in search and social sharing, potentially negating some advantages of its scarcity [12]. Group 6: Strategic Implications - The decision to invest heavily in DUIX.com raises concerns about the company's strategic foresight and understanding of market dynamics, as the name's drawbacks could impede effective brand promotion [13][17]. - The pursuit of a globally neutral name may overlook the importance of user experience in the primary market, which remains in China [14]. Group 7: Future Outlook - The success of DUIX.com will depend on the company's ability to overcome the inherent challenges of the name through strong operational capabilities and innovative product experiences [17]. - If the investment does not yield the expected results, it could serve as a cautionary tale for other tech companies attempting to make a significant impact in brand building without a solid foundation [17].
短视频时代,如何让数字人脱颖而出?专业团队揭秘核心拍摄法则
Sou Hu Cai Jing· 2025-05-09 03:39
Group 1 - The core viewpoint is that digital human technology is emerging as a new frontier in brand marketing, with Ming Shun Technology publicly sharing its core filming methodology to provide actionable solutions for industry practitioners [1] Group 2 - Scene selection significantly influences content effectiveness, with real-world park settings yielding a 42% higher completion rate for digital human short videos compared to green screen shoots [3] Group 3 - The use of equipment reveals technical intricacies, as experiments show that rear camera filming enhances facial micro-expression detail by 38%, and positioning the camera at waist height results in more natural postures, leading to a threefold increase in positive feedback in comments [5] Group 4 - Dynamic capture redefines interaction experience, with digital humans in educational settings increasing course conversion rates by 27% through natural gestures and eye contact, supported by a proprietary motion capture system that accurately reproduces 15 micro-expressions [7] - Ming Shun Technology's dual-track model of "technical delivery + operational empowerment" is transforming the industry ecosystem by providing a 4K-level digital human short video system along with established operational methodologies [7] - The ongoing development of artificial intelligence is transitioning digital humans from a technical concept to a commercial tool, emphasizing the importance of genuine content creation that resonates emotionally with audiences [7]
2025年客易云数字人技术革新与生态赋能,重塑行业TOP标准
Sou Hu Cai Jing· 2025-04-13 06:48
Core Technology - The company has developed a lip-teeth linkage engine based on anatomical principles, achieving a 99.3% match in lip movements with real actors through real-time coordination of 68 subtle actions and bite states [3] - Dynamic lighting compensation adjusts lip highlights and teeth reflections based on environmental light analysis, ensuring visual consistency under varying lighting conditions [3] - A library of 12 standard tooth models allows for personalized adjustments, improving tooth restoration accuracy by 40% compared to the industry average [3] Full-Chain Intelligent Tools - The AI broadcasting generation technology enables users to clone a high-fidelity digital human image from a 3-5 minute real video, completing the entire process of script writing, voiceover, and editing in 60 seconds [5] - An intelligent editing system integrates AI hot spot analysis and multilingual intelligent subtitles, supporting automated short video production in over 100 languages, with a 300% increase in material production efficiency [5] - The digital human supports real-time voice response and micro-expression feedback, utilizing natural language processing for precise interaction across language scenarios [5] Global Ecological Layout - The system supports multi-platform operation including apps, mini-programs, and H5, with one-click switching between multiple languages to facilitate rapid global market expansion [7] - Users can infinitely clone digital human images and create sub-accounts for team collaboration, with flexible power recharge options to match business needs [7] - The company offers an open agent recruitment plan and a subscription-based cloud service, reducing customization costs to 40% of the industry average, with an annual payment model available [7] Industry Trends and Future Role - As digital human technology evolves from a "tool" to "infrastructure," the company is promoting technology accessibility through API technology and customized industry solutions [9] - Its privatized deployment capabilities and modular design meet the compliance needs of sectors like finance and government, while the integration of "digital human + AI large models" lays the groundwork for future innovations in multi-modal interaction and emotional computing [9] - With dual drivers of policy benefits and technological advancements, the company has established itself as a leading force in the digital human field, potentially accelerating the industry's shift from "high-cost regulation" to "industrial production" through its "infinite cloning + infinite computing power" business model [9] Conclusion - The company redefines the technical standards of digital human services through biological-level detail processing, full-chain intelligent tools, and a global ecological moat, lowering the entry barriers for enterprises [11] - In the era of deep integration between AI technology and business needs, the company is leading the digital human industry towards a new age of "detail accessibility" and "intelligent symbiosis" [11]