IndexTTS2

Search documents
年轻人最关注哪些AI应用?B站发布榜单
Guan Cha Zhe Wang· 2025-07-27 11:18
Core Insights - Bilibili (B站) has emerged as a key platform for young users to engage with AI technology, highlighted by the release of the "Top 30 AI Applications" list based on user interest and engagement metrics [1][2] - The platform has seen significant growth in AI-related content consumption, with over 140 million users engaging monthly and a year-on-year increase of over 100% in daily viewing time for AI content [2][3] Group 1: AI Applications and User Engagement - The "Top 30 AI Applications" list includes popular tools such as Deepseek, Quark, and Kimi, which have generated substantial user interest and creative content on the platform [1][2] - Bilibili's AI content ecosystem is primarily driven by young users, with over 80% of viewers being post-95s, indicating a trend towards a younger demographic engaging with AI [2][3] Group 2: Content Creators and Community Engagement - Prominent content creators (UP主) on Bilibili are instrumental in educating users about AI, with channels dedicated to sharing the latest AI technologies and tutorials [3] - The platform has established an AI-themed video podcast space to facilitate discussions among creators, media, and industry professionals, contributing to a vibrant community around AI content [4] Group 3: Technological Innovations and Future Prospects - Bilibili showcased innovative AI projects at the 2025 World Artificial Intelligence Conference, including an AI-powered exam robot and a text-to-speech model (IndexTTS2) that excels in emotional voice synthesis [4][8] - The company aims to build a supportive ecosystem for AI creators, emphasizing the importance of reducing barriers to entry for new developers and enhancing content creation efficiency through AI technologies [8]
腾讯研究院AI速递 20250715
腾讯研究院· 2025-07-14 14:38
Group 1: Generative AI Developments - Comet is an "AI Agent native" browser designed to redefine the relationship between users and information, allowing for complex task execution across multiple tabs [1] - Meta's acquisition of PlayAI for nearly $100 million aims to enhance its audio generation capabilities, complementing its broader AI Superintelligence strategy with a total annual investment of $72 billion [2] - RoboBrain 2.0, developed by Zhiyuan Research Institute, surpasses GPT-4o in 10 evaluations, breaking through key capabilities in spatial understanding and long-chain reasoning [3] Group 2: AI Tools and Applications - Meitu's AI image agent "RoboNeo" allows users to perform various tasks like image retouching and website creation through simple commands, enhancing efficiency in image production [4][5] - Bilibili's AI voice model IndexTTS2 achieves high-quality voice conversion with precise duration control and emotional expression, setting a new standard in voice synthesis [6] - PixVerse's new "multi-keyframe generation" feature enables users to create coherent videos from multiple images, enhancing storytelling capabilities in video production [7] Group 3: AI in Scientific Research - The LabUtopia platform introduces a new paradigm for intelligent scientific laboratories, integrating cognitive models and robotic agents for closed-loop scientific exploration [9] Group 4: Perspectives on AI in Programming - DHH, the creator of Ruby on Rails, expresses disdain for AI programming assistants, advocating for hands-on coding as a means to develop skills and creativity [10] - Perplexity's CEO emphasizes a strategy of combining a browser with intelligent agents to create a cognitive operating system, aiming to compete with Google through speed and user experience [11]
B站下场自研AI配音!纯正美音版甄嬛传流出,再不用看小红书学英语了(Doge)
量子位· 2025-07-14 09:08
白交 发自 凹非寺 量子位 | 公众号 QbitAI 当甄嬛传、让子弹飞全都转英文,会怎样? 小红书经常刷到这种视频,然后英语就这么丝滑地经过我的脑子。 现在,AI就可以搞定!就像这样。 不仅符合原版的音色和情感,还能保证唇形同步。 很好, 以后再不需要看小红书麻烦配音老师来教我英语了(Doge)。 而这次出手的,正好是那个创造诸多魔性视频的 B站 。真是好你个B站。 他们发布的TTS模型IndexTTS2,在社区引发不少的关注。 网友表示:已经迫不及待地想用它来做搞笑视频了。 IndexTTS2:AI配音无压力 它最大的亮点,就在于在实现时长控制的同时,还能再现符合Prompt的情感特征。 它支持两种生成方式。 一种是明确token数量,以精准控制时长。 比如原音频是这样: 要求替换成的文本是只有当科技为本地社群创造价值的时候,才真正有意义。 那么控制它的时长分别为原来的0.75倍、1倍(原速)、1.25倍。效果是这样的。 另一种是无需手动输入,自动生成语音,同时保留输入提示的韵律特征。 比如生气的情感。 指定替换文本:你在我们屋里走路的时候,发现了一条遥远的路,这是不够奇怪的。 此外还支持音频和情绪表达独 ...