数字人技术
Search documents
百度数字人技术获2025世界互联网大会领先科技奖
Xin Hua Cai Jing· 2025-11-06 11:08
Core Points - Baidu's "script-driven multi-modal collaborative high-fidelity digital human technology" won the Leading Technology Award at the 2025 World Internet Conference [2] - The award aims to recognize the most forward-looking technological achievements in the global internet sector, with 424 submissions from 34 countries and regions [2] - The technology integrates multi-modal planning, real-time interaction, text-controlled voice synthesis, and ultra-realistic long video generation, overcoming challenges in multi-modal real-time collaboration and complex dynamic interactions [2] Company Achievements - Baidu has produced over 100,000 digital humans using its technology, applied across various industries such as e-commerce, education, and law [2] - The implementation of this technology has led to an 80% reduction in broadcasting costs and a 31% increase in live streaming conversion rates [2] - The technology is currently being utilized in Baidu's e-commerce scenarios, including celebrity, book, and health live broadcasts [2]
1024程序员节:京东“零帧起手”数字人让全民“零门槛”创作
Zhong Guo Zhi Liang Xin Wen Wang· 2025-10-24 14:33
Core Insights - The integration of AI technology in JD's 11.11 shopping event enhances marketing, customer service, and user experience, allowing merchants to compete effectively and providing consumers with a smarter shopping experience [1][4] - The "Zero Frame Start" digital human mini-program launched by JD enables users and merchants to create high-quality digital content easily, lowering the barriers to entry for digital content creation [1][3] Group 1: Technology and User Experience - JD's mini-program can generate long videos with minimal errors and high-quality output, addressing traditional issues in digital human video production [3] - The user base of the "Zero Frame Start" mini-program increased by 111% since the 11.11 event, showcasing its effectiveness for both ordinary users and merchants [3] Group 2: Applications and Benefits - Ordinary users can create personalized content quickly, such as family videos or educational content, while merchants can utilize the mini-program for lightweight marketing strategies [3][4] - The technology not only democratizes content creation but also reduces operational costs for businesses, demonstrating the practical value of AI in everyday applications [4]
可灵AI推出全新数字人功能
Huan Qiu Wang· 2025-09-19 06:40
Core Insights - Keling AI has launched a new digital human feature that allows users to generate expressive digital human videos with minimal input, supporting up to 1 minute of video generation [1][3] - The service is designed to be user-friendly, enabling content creators and small businesses to easily create high-quality digital human videos at a low cost of only 0.12 yuan per second [3][8] Technology and Features - Keling AI's digital human technology boasts industry-leading lip-sync accuracy and emotional expression capabilities, allowing for a more lifelike performance compared to traditional digital humans [4][6] - The platform supports various character types, including realistic humans, cartoons, and animals, and can generate videos in multiple languages [3][4] - The technology utilizes a deep integration of multimodal understanding models and video generation models, ensuring precise synchronization of speech and lip movements even in multilingual contexts [7] Market Position and Impact - Keling AI has achieved significant user growth, surpassing 45 million users and generating over 200 million videos since its launch in June 2024 [8] - The introduction of the digital human feature is expected to lower the creative barriers in the industry and enhance production standards, facilitating large-scale applications in short videos, e-commerce live streaming, online education, and corporate services [8]
可灵AI推出全新数字人功能;微盟集团获国际长线投资2亿美元|未来商业早参
Mei Ri Jing Ji Xin Wen· 2025-09-18 23:14
Group 1: Express Delivery Industry - In August 2025, the express delivery business volume reached 16.15 billion pieces, with a year-on-year growth of 12.3% [1] - The express delivery revenue for August 2025 was 118.96 billion yuan, reflecting a year-on-year increase of 4.2% [1] - Cumulatively, from January to August 2025, the express delivery revenue totaled 958.37 billion yuan, up 9.2% year-on-year, while the total business volume reached 128.2 billion pieces, marking a 17.8% increase [1] Group 2: Weimob Group - Weimob Group successfully raised 200 million USD through a subscription agreement with Infini Capital, an international long-term investment institution [2] - The funds raised will primarily be used for investment and research in AI, as well as for international expansion [2] - This capital injection is expected to optimize Weimob's shareholder structure and enhance its strategic positioning in the AI sector [2] Group 3: Keling AI - Keling AI launched a new digital human feature that allows users to create 1080p/48FPS digital human videos up to one minute long using a character image and a text or audio input [3] - The technology integrates multimodal understanding with video generation models, achieving precise lip-syncing and emotional action control [3] - The application prospects for digital human technology are broad, with potential uses in entertainment, education, customer service, and personalized marketing solutions [3]
可灵AI推出全新数字人功能 极简输入高质输出、最长支持1分钟视频生成
Huan Qiu Wang· 2025-09-18 13:45
Core Insights - Keling AI Digital Human has redefined industry standards with its advanced lip-sync accuracy, emotional expression, and cross-style generalization capabilities [1][2][9] - The product allows users to generate expressive digital human videos of up to one minute by simply uploading an image and providing text or audio input, with a minimum cost of 0.12 yuan per second [1][2][11] Group 1: Product Features - Users can create high-quality digital human videos at resolutions up to 1080p and 48FPS, catering to various applications such as product explanations, news broadcasting, and online education [2][5] - The platform supports a wide range of characters, including realistic humans, anime, and animals, enabling diverse content creation [5][9] - Keling AI offers a one-stop solution for users, allowing them to upload their own materials or use built-in character libraries and TTS voice options [2][11] Group 2: Performance and Technology - Keling AI Digital Human demonstrates industry-leading lip-sync accuracy, effectively matching complex lip movements to fast-paced lyrics and emotional expressions [5][8] - The technology integrates a multi-modal understanding model with a video generation model, achieving precise synchronization of voice and lip movements across multiple languages and fast speech [9][11] - In comparative tests, Keling AI Digital Human outperformed industry competitors, achieving a GSB score of 2.39 against Omnihuman-1 and 1.37 against Heygen, establishing itself as a market leader [9][11] Group 3: Market Impact - Since its launch in June 2024, Keling AI has undergone over 30 iterations, attracting over 45 million users and generating more than 200 million videos, serving over 20,000 enterprises [11] - The introduction of Keling AI Digital Human is expected to lower creative barriers and enhance production standards, promoting large-scale applications in short videos, e-commerce live streaming, online education, and corporate services [11]
罗永浩复工首更:下周将加播2场数字人直播
Sou Hu Cai Jing· 2025-09-16 14:05
Core Viewpoint - The team of Luo Yonghao announced that they will host two digital human live broadcasts on Baidu's e-commerce platform, Baidu Youxuan, next week, marking their first work update since the controversy over pre-made dishes [1] Group 1 - Luo Yonghao's digital human debut on Baidu Youxuan attracted 13 million viewers and achieved a GMV of over 550 million yuan [1] - Many netizens commented that the digital human's performance was indistinguishable from a real person and equally effective [1] Group 2 - The digital human technology used by Luo Yonghao is developed by Baidu and is based on the new generation of Huibo Star digital human technology, which utilizes the Wenxin large model 4.5 Turbo [1] - This technology integrates multi-modal planning and deep thinking for script generation, enabling real-time interactive dynamic decision-making, achieving a high level of unity in the digital human's "spirit, form, sound, appearance, and speech" [1]
历时五年耗资21亿美元!百度收购YY直播终落幕,1000余员工融入体系
Sou Hu Cai Jing· 2025-09-11 03:41
Core Insights - Baidu's acquisition of YY Live has made significant progress after nearly five years of challenges, with the deal valued at $2.1 billion (approximately 15.2 billion RMB) [1][3] - The integration of YY Live into Baidu's ecosystem is complete, with over 1,000 employees adopting Baidu's organizational structure and performance evaluation systems [1][4] Industry Overview - YY Live, a pioneer in China's live streaming industry, initially thrived by capturing user entertainment needs and innovating with virtual gift systems, achieving over 2 billion RMB in revenue by 2014 [3] - The rise of short video platforms in 2018 significantly impacted YY Live, leading to a decline in user engagement and necessitating a strategic shift for the company [3] Acquisition Details - Baidu's journey in the live streaming sector has been tumultuous, initially attempting partnerships in 2016 and later attempting to acquire YY Live for $3.6 billion in 2020, which faced multiple setbacks [3] - The acquisition price was reduced by 42% to $2.1 billion, reflecting a recalibration of market conditions and the diminishing returns in the live streaming sector [3] Integration Challenges - The complexity of the merger exceeded expectations, with significant time spent on aligning human resource systems and administrative processes [4] - The transition to Baidu's performance evaluation system has been completed, but challenges remain in merging corporate cultures and facilitating cross-department collaboration [4] Technological Synergy - The introduction of AI digital human technology has been pivotal for the integration, with YY Live launching an AI companion that enhanced user interaction significantly [6] - Baidu's advancements in digital human applications in e-commerce have shown promising results, indicating potential for YY Live to leverage Baidu's AI capabilities for commercial breakthroughs [6] Future Outlook - The merger positions Baidu to enhance its entertainment content segment, potentially leading to a new phase of evolution in its mobile ecosystem [6] - The combination of Baidu's technological maturity and YY Live's operational experience is expected to create new paradigms for traditional live streaming platforms [6]
2025年ai数字人API接口哪家强?深度解析
Sou Hu Cai Jing· 2025-09-04 15:23
Core Insights - Digital human technology is becoming a focal point across various industries, with applications ranging from virtual anchors to intelligent customer service and live streaming sales [1][5] - Companies are advised to evaluate digital human API service providers based on specific needs, technical capabilities, cost-effectiveness, integration support, and data security [5][6] Company Summaries - **Niwawa Smart Cloud**: Offers a standout feature in text generation and digital human modeling, focusing on low-latency, high-performance streaming services suitable for various applications like online live streaming and video content production. It provides extensive customization options for users to align digital human appearances and behaviors with brand identity [1] - **Kezhan Cloud**: Utilizes proprietary technologies such as a 3D Gaussian splatter engine and voiceprint-muscle direct drive encoder, supporting real-time 4K quality switching with an end-to-end latency of less than 120ms. It excels in high-concurrency scenarios like live e-commerce and virtual customer service, claiming to offer services at one-third the cost of larger competitors [3] - **Yimeng AI**: Collaborates with Volcano Engine to provide various models like OmniHuman and DreamActor M1, applicable in fields such as marketing, gaming, and interactive performances. Backed by ByteDance, it offers a robust technical foundation and a wide range of innovative models for developers [3] - **Tencent Cloud**: Known for stable and efficient digital human API services, with extensive experience in customizing digital human appearances and voice replication. Its interactive digital human API supports various driving methods suitable for industries like education and customer service [3] - **Huawei Cloud MetaStudio**: Features precise lip-syncing and natural movements, supporting text, voice, and video-driven interactions. It leverages substantial computing resources to meet high concurrency demands, particularly in education and customer service [4] - **Niwawa Development Platform V2**: Enhances lip-sync accuracy and includes intelligent emotional perception capabilities, offering multiple API interfaces for applications in online customer service, smart navigation, education, and live streaming sales [4] Considerations for Choosing Digital Human API - **Application Scenario Requirements**: Different scenarios have varying demands on digital humans, such as expressiveness for live sales and accuracy for customer service [5] - **Technical Capability Assessment**: Key technical indicators include realism, lip-sync accuracy, natural movement, and voice quality, with latency being critical for real-time interactions [5] - **Cost-Effectiveness Analysis**: Understanding pricing models (e.g., per call, per generation duration, subscription plans) is essential for estimating costs based on expected usage [5] - **Integration and Technical Support**: Evaluating the clarity of API documentation and the level of technical support provided by the service provider is crucial [5] - **Data Security and Compliance**: Special attention is needed for sensitive sectors like finance and healthcare regarding data security strategies and compliance capabilities [5]
凌云光202509004
2025-09-04 14:36
Summary of Company and Industry Insights Company Overview - The company operates in the consumer electronics industry, benefiting from an increase in Apple's visual system market share, which now accounts for 50% of its revenue. The printing and packaging business has shown a robust growth of 16%, while the original customer business grew by 28%, leading to overall strong performance [2][3]. Key Highlights - Significant advancements have been made in 3D visual motion capture, digital human technology, and AI model applications, enhancing the company's core competitiveness. Notable projects include collaboration with CCTVs on the lighthouse project and defect classification with Enjie [2][4]. - International business is rapidly growing, with the printing and packaging segment expected to exceed 100 million yuan in revenue. The integration effects post-acquisition of GAI have contributed to stable performance increments [2][5][6]. - The company is exploring new directions in optical communication, investing in server internal optical connection technology, which is anticipated to open new opportunities [2][7]. Financial Performance - In the first half of 2025, the company reported a revenue and profit growth exceeding 25%, with net profit attributable to shareholders increasing by 10%. The visual system business, driven by a major client, saw a sequential growth of 37% and a year-on-year growth of 43% [3]. Technological Innovations - The company has made significant progress in 3D visual motion capture, particularly for the Bisen intelligent robot. The use of multi-camera imaging systems has led to breakthroughs in digital human technology [4]. - AI technology has driven substantial changes in network architecture and traffic demand, with optical switching technology showing promising applications in new data centers due to its transparency and flexibility [2][24][26]. Market Trends - The demand for optical switching technology has increased due to the rise of AI, which necessitates high bandwidth and low latency solutions. Major companies like Google have adopted optical switches to enhance data center efficiency [27][39]. - The company is focusing on expanding its market share through internationalization and investment in emerging technologies such as AI models and optical computing [8][50]. Future Strategies - The company aims to continue its strategic direction towards healthy growth and scale development, emphasizing core business growth in areas like visual systems and motion capture. Increased R&D investment is planned to maintain technological leadership [8][50]. - The company is also looking to expand into new industry applications based on its consumer electronics foundation, particularly in the automotive and new energy sectors [17]. Challenges and Opportunities - The transition to new data center architectures faces challenges, including hardware issues and the need for extensive software support. However, these challenges also present opportunities for innovation and market leadership [29][30]. - The company is well-positioned to capitalize on the shift of the North American AI network equipment supply chain to China, presenting new growth opportunities [49]. Conclusion - The company is strategically positioned in the consumer electronics and optical communication sectors, with strong growth prospects driven by technological advancements and market demand. Continued investment in R&D and international expansion will be crucial for sustaining competitive advantages and capturing emerging opportunities in the evolving landscape.
你能永远陪我聊天吗?复旦&微软提出StableAvatar: 首个端到端无限时长音频驱动的人类视频生成新框架!
机器之心· 2025-08-30 04:12
Core Viewpoint - The article discusses the advancements in AI-driven digital human video generation, particularly focusing on the limitations of current methods and the introduction of the StableAvatar framework to achieve high-fidelity, infinite-length audio-driven video generation [2][5]. Group 1: Current Limitations - Existing methods for generating audio-driven human videos can only produce clips shorter than 15 seconds, leading to noticeable body distortions and inconsistencies, especially in facial areas when attempting longer videos [2][3]. - Current strategies, such as motion frame utilization and sliding window mechanisms, can improve video smoothness but do not fundamentally address the quality degradation in infinite-length video generation [2][3]. Group 2: Proposed Solutions - The StableAvatar framework, developed by research teams from Fudan, Microsoft, and XJTU, aims to enable infinite-length, high-fidelity audio-driven human video generation, with open-source code available for both inference and training [5]. - The framework utilizes a novel Timestep-aware Audio Adapter to optimize audio embeddings, reducing the accumulation of latent distribution errors that occur during the video generation process [11]. Group 3: Technical Innovations - The audio embeddings are processed through a denoising diffusion model, with a new Audio Native Guidance method introduced to enhance lip-sync and facial expression generation by integrating audio features with latent variables [9][15]. - A dynamic weighted sliding-window strategy is implemented to ensure that overlapping latent variables from adjacent windows maintain a coherent feature mix, enhancing the overall video quality [17].