MiniMax Speech 2.6
Search documents
黄仁勋投了家复刻马斯克声音的AI公司
Sou Hu Cai Jing· 2025-11-03 04:14
Core Insights - Cartesia, a voice AI company, has recently launched its new voice model Sonic-3 and completed a $100 million Series B funding round, with NVIDIA among the investors [1][3][12] Company Overview - Cartesia was founded by Karan Goel, a talented individual from Stanford AI Lab, who has previously excelled in the field of state space models (SSM) [2][10] - The company has a strong academic foundation, with its core team primarily composed of members from Stanford AI Lab, including co-founder Albert Gu, a notable figure in the development of the Mamba architecture [3][4] Product Development - Cartesia has rapidly progressed since its inception, launching its first product, the Sonic voice model, shortly after securing seed funding. The company has since released multiple iterations, including Sonic-2.0 and the latest Sonic-3 [6][12] - Sonic-3 features significant upgrades, including improved emotional expression and faster response times, with a latency of only 90 milliseconds and an end-to-end response time of 190 milliseconds, making it one of the fastest voice generation systems available [8][12] Technology Differentiation - Unlike traditional voice AI models that rely on Transformer architecture, Sonic-3 is built on SSM, allowing for more natural and context-aware interactions without the need to revisit the entire conversation history [8][12] - This innovative approach enhances the model's ability to capture emotional nuances and respond more fluidly, positioning Cartesia as a leader in real-time voice AI technology [8][12] Market Context - The voice AI sector is witnessing significant advancements, with other companies like MiniMax also launching competitive products, indicating a growing market for voice models that can handle diverse languages and accents [14]
黄仁勋投了家复刻马斯克声音的AI公司
量子位· 2025-11-03 03:12
Core Viewpoint - Cartesia, an AI voice company, has gained attention with its new voice model Sonic-3 and a recent $100 million Series B funding round, with notable investors including NVIDIA [3][4][13]. Group 1: Company Overview - Cartesia was founded by Karan Goel, a talented individual from Stanford AI Lab, who has previously excelled in the field of state space models (SSM) [5][6][28]. - The company has a strong academic foundation, with its core team primarily composed of members from Stanford AI Lab [7][11]. Group 2: Product Development - Cartesia's Sonic-3 model represents a significant upgrade, focusing on generating more human-like speech, capturing emotional nuances, and improving response speed [14][15][17]. - The model operates on a state space model (SSM) architecture, which allows for faster and more natural responses compared to traditional Transformer-based models [15][16]. Group 3: Funding and Growth - The company has rapidly progressed since its inception, securing seed funding in its second year and subsequently launching its first product, Sonic, which generated high-quality, natural-sounding speech [11][12]. - Following a $64 million Series A funding round earlier this year, Cartesia has now completed a $100 million Series B funding round, demonstrating its effective strategy of technology development alongside fundraising [12][13].
【产业互联网周报】 “十五五”规划建议:全面实施“人工智能+”行动,抢占人工智能产业应用制高点;黄仁勋GTC大会最新演讲勾勒AI蓝图;退出中国市场?SA...
Tai Mei Ti A P P· 2025-11-03 02:12
Domestic News - ZhiYuan released a multimodal world model Emu3.5, capable of cross-scenario embodied operations and complex interactions [2] - Zero One Wanwu and Open Source China launched the "Open AgentKit Platform" (OAK), a one-stop open-source solution for AI Agent development [3] - Boson Quantum won a quantum computing procurement project "Tianchen AI" from China Merchants Bank, providing quantum optimization algorithms and computing power [4] - Hand Information aims to achieve 300 million yuan in AI-related revenue this year, with a target of doubling next year [5] - Meituan launched and open-sourced the LongCat-Video model, enhancing video generation speed by 10.1 times [7] - Shengbang Security released a 200G high-speed link encryption gateway, achieving a throughput of over 200 Gbps and a latency of less than 3 microseconds [8] - DingTalk introduced a "1+4+N" AI solution for the mining industry, with nearly 50% of China's top 500 mining companies using DingTalk [9] - Doubao video generation model 1.0 pro fast was launched, achieving a speed increase of about 3 times and a price reduction of 72% [10] - Yimu Technology showcased a bionic tactile sensor at IROS, enhancing robotic interaction capabilities [11] - The world's first full-size bionic robot for classroom teaching was launched in Hefei, marking a significant step in AI education applications [12] - Liwu Copper and Huawei signed a framework cooperation agreement to promote intelligent transformation in mining [13] - The Chinese Academy of Sciences Hong Kong Innovation Research Institute and Huawei launched a new generation medical AI model CARES 3.0 [14] - MiniMax released the Hailuo 2.3 video generation model, improving dynamic expression and style presentation [15] - Duodian Intelligence partnered with Circle to accelerate the construction of a unique ecosystem combining retail, fintech, and Web3 [16] - Tanjike Technology launched a large model intelligent agent platform for AI digital employees, enhancing human-machine collaboration [17] - Wanlian Yida Group announced the launch of its first full-industry AI model "Wanlian Moore" [18] - MiniMax introduced the Speech 2.6 model, achieving audio generation latency below 250 milliseconds [19] - DingTalk released the DingTalk A1 Lite AI hardware, facilitating efficient voice communication management [20] - SAS China reportedly faces mass layoffs, raising concerns about its future in the Chinese market [21] Overseas News - OpenAI is offering a one-year free ChatGPT Go service to users in India to expand its market presence [22] - NVIDIA's GTC conference highlighted advancements in AI, including partnerships with Oracle and CrowdStrike [23] - Foxconn plans to invest $1.37 billion in AI computing clusters and supercomputing centers [24] - Meituan's international delivery brand Keeta launched operations in Abu Dhabi [25] - OpenAI completed a capital restructuring, solidifying Microsoft's stake in the company [26] - Blackstone-backed AirTrunk partnered with Saudi HUMAIN to invest approximately $3 billion in data centers [27] - Amazon announced plans to lay off about 14,000 employees to streamline operations and accelerate AI deployment [28] - OpenAI introduced Aardvark, a self-driven cybersecurity research agent powered by GPT-5 [29] Financing and Mergers - Pengnao Technology completed tens of millions in angel round financing for brain-computer interface technology development [31] - Ant Group acquired a stake in AI hardware developer Aide Future Intelligent [32] - Songyan Power completed nearly 300 million yuan in Pre-B round financing for humanoid robot development [33] - Global AI platform MAI raised $25 million in seed funding to enhance its AI Agent capabilities [34] - Microsoft signed a new agreement with OpenAI to strengthen their partnership [35] - Apex Context, founded by former ByteDance and Volcano AI executives, secured millions in funding for AI-driven marketing solutions [36] - Pyromind Dynamics raised millions in seed funding to expand its reinforcement learning services [37] Policies and Trends - Shandong Province aims to achieve comprehensive low-altitude communication network coverage by 2030 [41] - Shandong is accelerating the construction of 5G-A and integrated sensing base stations [42] - Shanghai plans to establish a millisecond-level computing network by 2027 [48] - The Ministry of Transport is promoting large-scale AI applications in transportation [47] - The National Development and Reform Commission encourages the transformation of inefficient computing facilities [50]
腾讯研究院AI每周关键词Top50
腾讯研究院· 2025-11-01 02:33
Core Insights - The article presents a weekly roundup of the top 50 keywords related to AI developments, highlighting significant trends and innovations in the industry [2]. Group 1: Chips - Vera Rubin is a notable keyword associated with NVIDIA, indicating advancements in chip technology [3]. - Qualcomm has introduced a new AI inference solution, showcasing its commitment to enhancing AI capabilities [3]. Group 2: Models - OpenAI has developed a safety classification model, emphasizing the importance of security in AI applications [3]. - Cursor has launched its self-developed Composer model, reflecting the trend of companies creating proprietary AI models [3]. - NVIDIA's OmniVinci model and MiniMax's M2 model are also highlighted, indicating ongoing innovation in AI modeling [3][4]. Group 3: Applications - Sora has introduced a role cameo feature, enhancing user interaction with AI [3]. - MiniMax Speech 2.6 and Beijing Zhiyuan's WuJie·Emu3.5 are examples of new AI applications aimed at improving communication [3]. - Adobe's Firefly Image 5 and Tencent's interactive AI podcast demonstrate the growing integration of AI in creative and media sectors [3][4]. Group 4: Technology - The NEO home robot by 1X Technologies and the LeRobot v0.4.0 by Hugging Face represent advancements in consumer robotics [4]. - Neuralink's PRIMA artificial vision and Merge Labs' ultrasound brain-machine interface highlight significant technological innovations in AI and neuroscience [4]. Group 5: Capital - OpenAI is undergoing a capital structure reorganization and has plans for an IPO, indicating its growth and potential market impact [4]. Group 6: Events and Opinions - There is a call for copyright protection in Japan, reflecting ongoing discussions about intellectual property in the AI space [4]. - Yoshua Bengio's new definitions of AGI and insights on mental health data from OpenAI indicate evolving perspectives on AI's role in society [4].
腾讯研究院AI速递 20251031
腾讯研究院· 2025-10-30 16:06
Group 1: OpenAI Developments - OpenAI has open-sourced the gpt-oss-safeguard safety classification model in both 120 billion and 20 billion parameter versions, which can directly understand policy documents for content classification without retraining [1] - The model outperforms GPT-5-thinking in multiple benchmark tests, achieving industry-best cost-effectiveness on content moderation evaluation sets and the ToxicChat dataset [1] - OpenAI has internally utilized this technology (Safety Reasoner prototype) for image generation and products like Sora 2, with safety reasoning computing accounting for 16% of its operations [1] Group 2: Cursor 2.0 Update - Cursor has released version 2.0, introducing its first self-developed coding model, Composer, which generates at a speed of 250 tokens per second, four times faster than similar leading systems [2] - Composer employs a mixture of experts (MoE) architecture optimized for software engineering through reinforcement learning, achieving cutting-edge performance in Cursor Bench evaluations [2] - The new interface supports multi-agent parallel collaboration, allowing different models to process the same task simultaneously based on git worktree or remote machines, and includes native browser tools for testing iterations [2] Group 3: Sora New Features - Sora has launched the Character Cameo feature, enabling consistency for non-human cameo characters and allowing extraction of virtual characters from generated videos for self-cycling [3] - New video splicing functionality and community rankings have been added, categorizing the most used cameo characters and the most remixed videos [3] - Sora has temporarily lifted the invitation code restriction for direct registration in the US, Canada, Japan, and South Korea, coinciding with the launch of its Android version to capture the Android market [3] Group 4: MiniMax Speech 2.6 Update - MiniMax Speech 2.6 has achieved an end-to-end latency of under 250 milliseconds, reaching industry-leading levels and becoming the underlying technology engine for global voice platforms like LiveKit and Pipecat [4] - The new version supports direct conversion of non-standard text formats such as URLs, emails, phone numbers, dates, and amounts without cumbersome text preprocessing, facilitating smoother information transmission [4] - Fluent LoRA functionality allows for the generation of fluent and natural speech even from recordings with accents or non-native fluency, supporting over 40 languages [4] Group 5: Emu3.5 Launch - Beijing Zhiyuan has released the Emu3.5 multimodal world model, based on a 34 billion dense transformer pre-trained on over 10 trillion tokens (approximately 790 years of video), revealing the "multimodal scaling paradigm" for the first time [5] - It employs a "next state prediction" objective to achieve visual narrative and guidance capabilities, matching the performance of Gemini-2.5-Flash-Image in image editing tasks [5] Group 6: OpenAI IPO Plans - OpenAI plans to submit its IPO application as early as the second half of 2026, aiming to raise at least $60 billion, with a valuation potentially reaching $1 trillion, making it the largest IPO in history [6] - Following a restructuring, the non-profit organization will hold 26% of the newly formed OpenAI Group, while Microsoft will relinquish exclusive cloud service priority but will receive an additional $250 billion Azure procurement contract [6] - The new agreement stipulates that the realization of AGI must be verified by independent experts, extending Microsoft's rights to use OpenAI technology until 2032, while allowing it to conduct AGI research independently or collaborate with third parties [6] Group 7: OpenFold3 Release - OpenFold Consortium has released a preview of OpenFold3, trained on over 300,000 experimental structures and 13 million synthetic structures, capable of predicting interactions between proteins and small molecule ligands, as well as nucleic acids [7] - In single-stranded RNA structure prediction, its performance rivals that of AlphaFold3, featuring a modular design that allows users to modify the model for native data interpretation [7] - All components are licensed under Apache 2.0, permitting commercial use, with companies like Novo Nordisk, Outpace Bio, and Bayer planning to leverage the model to accelerate research [7] Group 8: Anthropic Research Findings - Anthropic's latest research reveals that Claude can detect and report concepts injected by humans, achieving a 20% success rate in introspection for the strongest models [8] - The research team found that models could defend and fabricate reasons for their "errors" based on falsified internal states through retrospective concept injection [8] - Experiments demonstrate that AI possesses deliberate control over internal representations, marking the emergence of "reachable consciousness," though it remains distant from having subjective experiences or "phenomenal consciousness" [8] Group 9: Grokking Research Insights - Former Meta FAIR head Tian Yuandong published research on Grokking, proving mathematically that models require only O(M log M) samples for generalization, significantly lower than the traditional M² requirement [9] - He revealed that the essence of "insight" is a multi-peak non-convex optimization process, where increased data raises the "generalization peak" above the "memory peak," leading to a transition from memory to generalization [9] - Tian emphasized that representation learning is foundational to all intelligent capabilities, with the loss function serving merely as a proxy signal for optimization, and true breakthroughs stemming from changes in representation methods [9]