腾讯研究院AI速递 20250609

Group 1: OpenAI and Voice Technology - OpenAI has upgraded its advanced voice feature in ChatGPT, making the voice sound more natural and capable of expressing emotions and tone variations, enhancing human-like communication [1] - The new real-time translation feature allows for cross-language conversations, functioning as a simultaneous interpreter in international settings, and is available to all paid users [1] Group 2: ElevenLabs and Emotional Control - ElevenLabs released the new TTS model Eleven v3, claiming it to be the most expressive text-to-speech model to date, supporting over 70 languages [2] - The model introduces an audio tagging system for precise emotional expression control, including emotion tags, sound effect tags, and special tags, with punctuation also affecting emotional delivery [2] - It supports multi-character dialogue, allowing different voices for various roles, with better performance in English compared to Chinese, currently in beta testing [2] Group 3: OpenAudio S1 and Voice Cloning - Fish Audio launched the OpenAudio S1 voice cloning model, enabling precise control over voice emotions, tone, and rhythm through simple commands, rivaling professional voice acting [3] - Utilizing a dual autoregressive architecture and RLHF technology, it supports 13 languages, including Chinese and English, ranking first in TTS-Arena [3] - The pricing is set at $15 per million bytes (approximately $0.8 per hour), targeting content creation and voiceover industries, with future plans for copyright voice registration and revenue sharing [3] Group 4: PixVerse and User Engagement - Aishi Technology launched the domestic version of PixVerse, "拍我AI," which has gained 60 million users overseas and 16 million monthly active users, previously ranking fourth overall in the U.S. [4] - The product offers a variety of features, including hundreds of templates, frame transitions, multi-subject capabilities, camera movements, and video re-drawing, with a generation speed of under one minute [4][5] - "拍我AI" balances fun and usability, allowing casual users to quickly enjoy creative experiences while meeting professional creators' needs for functionality and efficiency [5] Group 5: Zhiyuan's New Models - Zhiyuan Research Institute released the new Wujie series of large models aimed at bridging AI from the digital world to the physical world, comprising four models covering areas from microscopic life to embodied intelligence [6] - The Wujie series includes the native multimodal world model Emu3, brain science multimodal foundational model Jianwei Brainμ, cross-entity embodied collaboration framework RoboOS 2.0, and the embodied brain RoboBrain 2.0, along with the atomic microscopic life model OpenComplex2 [6] - Zhiyuan has open-sourced approximately 200 models and 160 datasets, with a total global download exceeding 640 million, establishing a comprehensive open-source technology system for large models [6] Group 6: AI in Mathematics - Thirty top mathematicians secretly tested OpenAI's o4-mini at UC Berkeley, discovering that AI can solve about 20% of professor-level math problems, outperforming most participating teams [7] - Mathematician Ken Ono acknowledged that AI demonstrates near-genius levels in mathematics, solving complex problems in minutes that would take human experts weeks or months [7] - Terence Tao shared on social media the remarkable progress of AI in mathematical research, indicating that AI will become a reliable collaborator in the field [7] Group 7: Figure AI and Robotics - Figure AI's humanoid robot Helix achieved significant breakthroughs after three months of working in logistics, capable of handling various package types [8] - The robot's performance improved, with package processing speed increasing from 5.0 seconds per item to 4.05 seconds, and barcode scanning success rate rising from 70% to 95%, demonstrating adaptive behaviors [8] - These advancements are attributed to enhancements in three key technologies (visual memory, state history, force feedback) and an increase in training data from 10 hours to 60 hours, enabling collaboration with humans through "visual conditioning" [8] Group 8: Apple's Research on Reasoning Models - Apple's research questions the true reasoning capabilities of models like DeepSeek and Claude, suggesting they create an illusion of thought rather than possessing stable thinking processes [10] - Testing with complex puzzles revealed that reasoning models experience "catastrophic failure" and "cognitive degradation" when faced with high-complexity problems, often failing to execute given algorithms [10] - The study identified three performance ranges: standard models excel at simple problems, intermediate reasoning models perform better at moderate complexity, while both types fail at high complexity [10] Group 9: OpenAI's Human-AI Emotional Connection - OpenAI's leader Jang acknowledged that users are developing dependencies on ChatGPT, predicting that as AI systems integrate into more life scenarios, emotional bonds will deepen [11] - The article categorizes AI consciousness into "ontological consciousness" and "perceptual consciousness," forecasting that even if users recognize AI's lack of consciousness, perceptual awareness will still increase with model intelligence [11] - OpenAI aims to find a balance in product design, keeping ChatGPT warm and caring without pursuing emotional connections, planning to expand evaluations and share findings publicly [11] Group 10: Google's AI Development - Google CEO Pichai stated that as AI models mature, they will migrate to the main search page, with AI overviews enhancing user satisfaction and driving product growth [12] - Internally, Google's AI tools generate about 30% of code, improving engineering efficiency by 10%, allowing programmers to focus on more creative tasks [12] - Pichai believes we are in an unbalanced phase of artificial intelligence, predicting that achieving AGI will be challenging before 2030, while asserting that AI's recursive self-improvement will make it a more significant technological invention than electricity [12]