Suno V4.5
Search documents
中国AI音乐,悄悄把全球第一拿走了
量子位· 2026-03-25 06:31
Core Viewpoint - The article highlights that China's AI music model, Mureka V8 by Kunlun Wanwei, has achieved the top position in the Artificial Analysis music model leaderboard, surpassing international models like Suno V4.5 and Udio v1.5 Allegro, marking a significant milestone in the AI music industry [1][25]. Group 1: Mureka V8's Achievements - Mureka V8 has claimed the number one spot in both vocal and instrumental categories, showcasing its dual capabilities [2][25]. - The model's performance is characterized by its ability to generate music that feels natural and emotionally resonant, with clear articulation and nuanced breathing effects in vocal performances [5][20]. - Mureka V8's instrumental capabilities have also impressed, producing engaging and recognizable riffs, demonstrating a significant understanding of musical elements [18][19]. Group 2: Development Timeline - The development of Mureka has been rapid, with eight major versions released in less than two years, averaging an update every three months [27][28]. - The evolution of Mureka reflects a systematic approach to making AI-generated music not only usable but also enjoyable, transitioning from basic functionality to high-quality production [30][31]. - By the time of the V8 release, Mureka had achieved a level of sophistication that allowed it to produce complete songs with coherent structure and emotional depth [38][40]. Group 3: Industry Context - The article discusses a shift in the AI music landscape, where domestic models like Mureka V8 are reclaiming leadership from previously dominant international models [46][47]. - This trend is part of a broader pattern in the AI sector, where Chinese companies are increasingly catching up and even surpassing their Western counterparts in various AI applications [49][51]. - The success of Kunlun Wanwei in AI music is attributed to its long-term investment in making "good music" a reproducible system capability, supported by a robust domestic user base and diverse application scenarios [57][58].
腾讯研究院AI速递 20250507
腾讯研究院· 2025-05-06 10:46
Group 1: OpenAI and AI Models - OpenAI has abandoned its complete profit-oriented transformation, with a non-profit organization continuing to control it, while the profit-oriented entity has transitioned to a Public Benefit Corporation (PBC) [1] - The company structure adjustment has removed the profit cap system and adopted a conventional equity structure, with the non-profit organization becoming the main shareholder of the PBC [1] - OpenAI has committed to focusing on the development of AGI for the benefit of humanity and plans to open-source some high-performance models [1] Group 2: NVIDIA's Innovations - NVIDIA has released the Llama-Nemotron open-source model family, which includes specifications ranging from 8 billion to 253 billion parameters, supporting dynamic inference mode switching under an open commercial license [1] - The LN-Ultra model utilizes Puzzle framework and FFN fusion technology to optimize deployment efficiency, surpassing DeepSeek-R1 in inference performance and throughput [1] - NVIDIA's Parakeet TDT 0.6B speech recognition model can transcribe 60 minutes of audio in just one second, achieving a word error rate of only 6.05% [3] - The Parakeet model employs FastConformer-TDT architecture, capable of processing 24-minute audio segments at once, and supports punctuation prediction and timestamps [3] Group 3: Music Generation and AI Applications - The ACE-Step model combines deep compressed autoencoders, diffusion models, and linear Transformers, generating 4 minutes of music in 20 seconds on A100, which is 15 times faster than the baseline [5] - ACE-Step supports music generation in 19 languages across various styles, featuring advanced controls like voice cloning and lyric editing [5] - Suno V4.5 version supports music generation for up to 8 minutes, introducing new styles such as punk rock and jazz house, with enhanced vocal expressiveness [3] Group 4: AI in Historical Research - Researchers have successfully non-invasively read the title of an ancient scroll from Herculaneum, marking the first time a text from 2000 years ago has been revealed, utilizing AI image segmentation and ink detection technology [6] - This discovery was achieved by two teams, with a reward of $60,000 for the successful retrieval of the text [6] Group 5: Legal Perspectives on AI - Legal protections primarily cover specific "expressions" rather than abstract "styles," meaning that merely mimicking the Ghibli art style typically does not constitute infringement, while using specific characters and plots may lead to infringement [7] - There are legal risks associated with using unauthorized AI training data, and the traditional "license before use" model is becoming outdated, with a lack of relevant legislation and exemption mechanisms [7] - Artists' core competitiveness in the face of AI challenges lies in their depth of thought and insight into the times, focusing on unique perspectives rather than mere technical replication [7]