Scribe v2 Realtime
Search documents
腾讯研究院AI速递 20251113
腾讯研究院· 2025-11-12 16:08
Group 1: Generative AI Developments - Meta's Chief AI Scientist LeCun is leaving the company due to strategic disagreements, focusing on "world models" in a new startup [1] - Google's AI model successfully transcribed an 18th-century ledger with a character error rate of only 1.7%, showcasing advanced abstract reasoning capabilities [2] - ElevenLabs launched the Scribe v2 Realtime model, achieving a 93.5% accuracy rate across 90 languages with a latency of just 150 milliseconds [3] Group 2: AI in Communication and Music - OpenAI is set to introduce a group chat feature for ChatGPT, allowing users to share conversation links while maintaining privacy [4] - An AI-generated song topped the Billboard country digital singles chart, raising concerns about the competition between AI and human artists [5] Group 3: Investment and Financing in AI - The AI company Jiga Vision completed a financing round of over 100 million yuan, with investments from Huawei and other funds [6] - Gamma, an AI presentation tool, raised $68 million in Series B funding, achieving a valuation of $2.1 billion and generating an annual recurring revenue of $100 million [9] Group 4: Programming Language Trends - TypeScript has surpassed Python as the most widely used programming language on GitHub, with a 66% year-over-year increase in contributors [8]
硅谷热议:最快语音转文字模型
量子位· 2025-11-12 08:01
Core Insights - The article discusses the launch of Scribe v2 Realtime, a cutting-edge speech-to-text model by ElevenLabs, which has garnered significant attention in Silicon Valley for its impressive performance metrics [3][4][16]. - The model boasts a latency of just 150 milliseconds and an accuracy rate of 93.5%, supporting over 90 languages, marking a significant advancement in the field of real-time speech transcription [4][10][15]. Company Overview - ElevenLabs, founded in 2022, focuses on AI voice technology and has quickly established itself in the industry, achieving over $200 million in revenue within 20 months of operation [18][21]. - The company’s founding team includes former Google machine learning engineers and Palantir strategists, emphasizing a strong technical background [19][23]. - ElevenLabs operates with a unique team structure, consisting of small, agile teams without formal titles, allowing for efficient decision-making and operations [23]. Product Features - Scribe v2 Realtime is designed to handle various audio formats and includes features like voice activity detection and customizable audio stream processing, enhancing its usability for diverse applications [10][12]. - The model has shown remarkable adaptability, accurately transcribing speech even in noisy environments and with complex terminology, which is a significant improvement over previous models [9][13]. Industry Context - The real-time speech-to-text sector has evolved through multiple technological iterations, with earlier models struggling with accuracy and latency issues, often exceeding 30% error rates in noisy conditions [13][14]. - The introduction of the Transformer architecture has alleviated the long-standing trade-off between speed and accuracy, enabling models like Scribe v2 Realtime to achieve both high accuracy and low latency [14][15].