FireRedChat

Search documents
小红书发布FireRedChat:首个可私有化部署的全双工大模型语音交互系统
Sou Hu Cai Jing· 2025-10-03 14:28
Core Insights - The article introduces FireRedChat, the industry's first full-duplex large model voice interaction system that supports private deployment, addressing issues like high latency, noise sensitivity, and poor controllability [2][7][18] - FireRedChat aims to create a more natural and empathetic AI voice assistant, capable of understanding and responding to emotional cues, thus enhancing user experience [4][8][18] Group 1: System Features - FireRedChat is built on a complete architecture of "interaction controller + interaction module + dialogue manager," allowing for seamless upgrades from half-duplex to full-duplex systems [2][11] - The system integrates proprietary models such as pVAD and EoT, which enhance real-time responsiveness and robustness while minimizing external noise interference [7][11] - It offers two deployment options: cascading and semi-cascading, catering to different business needs regarding stability, temperature, and cost [7][11] Group 2: Performance Metrics - Experimental results indicate that FireRedChat outperforms other open-source frameworks in key performance indicators, achieving near-industrial-level latency in local deployments [7][15][18] - The system's false barge-in rate is significantly lower at 10.2% compared to competitors, demonstrating its effectiveness in managing interruptions during conversations [15] - FireRedChat's semantic endpoint detection accuracy is enhanced by EoT, reducing awkward pauses and interruptions [15] Group 3: User Experience - The AI assistant built on FireRedChat is designed to provide a more human-like interaction, capable of emotional perception and empathetic responses [4][8] - It aims to create a sense of companionship, allowing users to feel understood and supported during conversations [4][8] Group 4: Open Source and Deployment - FireRedChat is fully open-source, allowing developers and enterprises to deploy it in private environments without external dependencies or API costs [12][18] - The system's modular design facilitates easy integration and customization, making it accessible for ordinary users and developers alike [12][18] Group 5: Future Outlook - The FireRed Team plans to continue iterating on FireRedChat, incorporating more advanced features and engaging with the global open-source community to enhance voice AI usability [18]
小红书发布FireRedChat:首个可私有化部署的全双工大模型语音交互系统
机器之心· 2025-10-02 03:12
在线体验:https://fireredteam.github.io/demos/firered_chat 开源代码:https://github.com/FireRedTeam/FireRedChat 小红书智创音频团队推出业内首个支持私有化部署的全双工大模型语音交互系统 FireRedChat,自研流式 pVAD 与 EoT 让语音交互更加自 然,首发级联与半级联两套实现,端到端时延逼近工业级应用。彻底开源、可私域落地,打造真正 "知冷暖、能共情、懂表达" 的语音 AI。 小红书智创音频团队发布 Fi r eR ed Chat —— 业内首个支持私有化部署的全双工大模型语音交互系统,直击延迟高、噪声敏感、可控性差、依赖外部 API 等痛 点。 FireRedChat 基于 "交互控制器+交互模块+对话管理器" 的完整架构,将任意半双工链路一键升级为全双工;集成自研流式个性化打断 pVAD、语义判停 EoT、 FireRedTTS-1s、FireRedASR、FireRedTTS2 等核心模型,提供级联与半级联两种端到端服务部署方案,覆盖从 "稳定易部署" 到 "更有温度" 的不同需求,显著提升 实时性、鲁 ...
没想到,音频大模型开源最彻底的,居然是小红书
机器之心· 2025-09-17 09:37
Core Viewpoint - The article highlights the recent surge in open-source AI models in the audio domain, particularly by domestic companies in China, with a focus on the advancements made by Xiaohongshu in developing high-quality audio models and fostering an open-source community [1][4][22]. Summary by Sections Open Source Trends - In recent months, open-source has become a focal point in the AI community, especially among domestic tech companies, with 33 and 31 models being open-sourced in July and August respectively [1]. - The majority of these open-source efforts are concentrated in text, image, video, reasoning, and world models, while audio generation remains a smaller segment [1][2]. Xiaohongshu's Contributions - Xiaohongshu has maintained a steady rhythm of open-sourcing audio technologies since last year, releasing models like FireRedTTS for text-to-speech (TTS) and FireRedASR for automatic speech recognition (ASR), achieving state-of-the-art (SOTA) results [3][4]. - The open-sourcing of high-quality audio models enhances Xiaohongshu's technical influence and signals a long-term strategic commitment to open-source development [4][22]. Technical Achievements - Xiaohongshu's FireRedTTS model allows for flexible voice synthesis, enabling the imitation of various speaking styles with minimal training [6][9]. - FireRedASR has achieved a character error rate (CER) of 3.05%, outperforming other closed-source models [7][8]. - The new FireRedTTS-2 model addresses existing challenges in voice synthesis, providing superior solutions for long dialogue synthesis and achieving industry-leading performance in audio scene modeling [9][11]. Ecosystem Development - Xiaohongshu aims to build a comprehensive open-source community around audio models, covering TTS, ASR, and voice dialogue systems, thereby lowering industry entry barriers and fostering innovation [22][23]. - The introduction of FireRedChat, a fully open-source duplex voice dialogue system, represents a significant advancement, providing a complete solution for developers to create their own voice assistants [17][22]. Future Plans - Xiaohongshu plans to release additional models, including FireRedMusic and FireRedASR-2, to further enhance its audio technology stack and support a broader range of applications [22][26]. - The company is committed to establishing itself as a leader in the open-source audio domain, with a focus on creating industrial-grade, commercially viable models [23][26]. Industry Impact - The article emphasizes that open-source initiatives are reshaping the AI landscape, making advanced capabilities accessible to a wider audience and fostering a collaborative environment for innovation [25][26].