AI Voice 2.0
Search documents
对谈Fish Audio:千万ARR、12个月13倍增长,我们正进入AI Voice 2.0的技术爆发期
Founder Park· 2026-02-26 14:35
Core Insights - The article discusses the evolution of AI voice technology, highlighting Fish Audio's growth and its unique positioning in the market as the second-largest AI voice generation platform globally, with a 13-fold increase in revenue to reach $10 million ARR and over 350 million users [6][29]. - Fish Audio's S1 model is noted as the world's first TTS model capable of natural language emotion control, which sets it apart from competitors like ElevenLabs [7][10]. - The company emphasizes the importance of "dirty data," such as emotional and argumentative audio, which traditional companies often discard, as a valuable asset for training their models [3][19]. Company Overview - Fish Audio is positioned as a leading AI voice generation platform, providing multilingual TTS and high-precision voice cloning services to a diverse user base, including game developers and content creators [5][6]. - The platform has achieved significant user engagement, with over 1 million monthly active users and a marketplace of 1.1 million public voice models [6][32]. - The company originated from an open-source project, leveraging its community roots to build a strong user base and product offerings [6][41]. Product Development - The S1 model allows users to control emotional expression in generated speech, with future iterations like S2 expected to enhance features such as multi-speaker support and lower latency [7][21]. - Fish Audio's approach to data collection focuses on high-quality, diverse datasets, including emotional and multi-speaker audio, which are crucial for improving model performance [17][19]. - The company plans to release the S2 model, which will incorporate advanced features and a refined data pipeline to enhance the quality of generated audio [21][24]. Market Positioning - Fish Audio targets both consumer and enterprise markets, with a significant portion of revenue coming from prosumer creators, which is relatively uncommon in the AI infrastructure space [29][30]. - The company aims to differentiate itself from competitors like ElevenLabs by focusing on more engaging and emotionally resonant voice outputs, catering to the entertainment and AI-native application sectors [43][44]. - Future growth strategies include expanding into traditional enterprise markets while maintaining a strong foothold in the AI-native app space [44][45]. Unique Selling Proposition - The platform's extensive UGC voice model market, with 1.1 million models, serves as a competitive advantage, enhancing user engagement and attracting enterprise clients [32][36]. - Fish Audio's innovative use of "dirty data" for model training allows for a more nuanced understanding of emotional expression in voice generation, setting it apart from traditional TTS solutions [19][20]. - The company’s commitment to open-source principles fosters trust and engagement within the developer community, driving adoption and growth [41][42].