Workflow
语音识别
icon
Search documents
豆包输入法上线,用了两天我在微信聊天不想再打字
Xin Lang Cai Jing· 2025-11-24 16:25
没有那些乱七八糟的推广弹窗和会员广告。不过嘛,安装包体积属实有点大,在 vivo 应用商店显示 139MB。而且这体积大归大,功能却不完整,用「毛坯房」来形容挺贴切的。 如果你用过一些手机系统自带输入法,一定懂那种它好像没学过中文的无力感。 正因如此,第三方输入法才得以遍地开花。 当官方输入法不好用,用户自然会用脚投票。最近字节推出了豆包输入法 1.0 版本,要用 AI 重新定义 输入体验。 趁着热乎劲,我也下载体验了两天,发现了一些让人又爱又无奈的细节。 语音识别断档领先,豆包输入法的「杀手锏」有多强? 豆包输入法的界面走的是极简风。 简单说就是把语音转成文字, 让机器「听懂」你在说什么。和之前的大型 ASR 模型比,Seed-ASR 在 中英文测试集上的错误率降低了 10%-40%。 先说说豆包输入法的核心竞争力。 在我看来,语音打字是豆包输入法的杀手锏,体验感是断档级领先,确实比我用过的其他输入法强上一 大截。在我体验的这两天里,我已经深深爱上了语音打字的体验。 一句话概括,按住说话,松手结束。 实测下来,中文、英文和粤语识别准确度都高得离谱。我随口说了句「我正在用普通话测试豆包输入 法」,一字不差,再 ...
豆包输入法正式上线,语音识别精准,支持多方言
Xin Lang Ke Ji· 2025-11-24 09:00
据悉,豆包输入法目前已正式上线安卓应用商店,并将于近期登陆苹果应用商店。 责任编辑:何俊熹 在语音输入方面,豆包输入法可在复杂环境下实现精准识别,能够适应轻声、快速说话以及嘈杂环境等 多种场景。当语音识别出现错误时,用户只需手动修正一次,系统即可记住修正结果,实现"一次调 整,长期精准"的个性化识别效果。针对部分语音输入的高频错误问题,用户只需将光标移动至文字右 侧,系统就会自动弹出同音词选项(如"他/她/它"),无需重新录音。 在方言支持方面,豆包输入法目前已支持粤语、四川话、陕西话、江淮方言、冀鲁方言、兰银方言、晋 语等多种方言输入,部分方言识别准确率接近普通话,进一步提升不同地区用户的使用体验。 新浪科技讯 11月24日下午消息,近日,豆包输入法正式上线,提供语音输入及键盘输入两种方式。基 于豆包App同款语音模型,豆包输入法在语音识别和语义理解方面进一步提升,支持多种方言、英语及 中英混合输入,并具备自动纠错功能;键盘输入也支持自动纠错和文字、符号、emoji表情等多种智能 联想。 ...
翻译界的ChatGPT时刻,Meta发布新模型,几段示例学会冷门新语言
3 6 Ke· 2025-11-11 12:12
Core Insights - Meta has launched the Omnilingual ASR system, capable of recognizing over 1,600 languages, aiming to bridge the digital divide in language technology [1][2][5] - The system supports 500 languages that have never been transcribed by any AI system before, significantly expanding the language coverage compared to existing models like OpenAI's Whisper [5][7] - Omnilingual ASR introduces a flexible learning mechanism that allows the model to learn new languages from minimal examples, potentially expanding its coverage to over 5,400 languages [10][11] Language Coverage and Performance - The system achieves a low character error rate (CER) of less than 10% for 78% of the tested languages, and this rate increases to 95% for languages with over 10 hours of training data [7][8] - It categorizes languages into high-resource, medium-resource, and low-resource, with 95% of high and medium-resource languages achieving a CER below 10% [8] - Even low-resource languages show promise, with 36% achieving a CER below 10% [8] Open Source and Community Engagement - Omnilingual ASR is fully open-sourced on GitHub, allowing researchers, developers, and organizations to use and modify the model without licensing restrictions [11][13] - Meta has released a large multilingual speech dataset, including transcriptions for 350 underrepresented languages, to support community-driven language recognition [13][14] - The development involved collaboration with local language organizations to gather diverse voice samples, ensuring cultural sensitivity and community involvement [15][16] Technical Specifications - The model architecture includes a range of sizes, from lightweight models suitable for mobile devices to larger models with up to 7 billion parameters for high accuracy [16][18] - Training utilized over 4.3 million hours of audio data across 1,239 languages, marking it as one of the largest and most diverse speech training datasets ever [18] - The system's design allows for continuous growth and adaptation, enabling it to evolve alongside the diversity of human languages [18]
X @𝘁𝗮𝗿𝗲𝘀𝗸𝘆
测试过 macOS26 的原生输入法语音识别了,依然被 Soniox 模型吊打。只能说是堪堪可用, 远远达不到好用的状态。尤其是当中英文混输的时候。 ...
140+页PPT详解全球科技发展趋势与材料产业最新进展
材料汇· 2025-07-18 15:50
Core Viewpoint - The article discusses the latest advancements and trends in artificial intelligence and robotics, highlighting various innovative fields and technologies that are shaping the future of these industries. Group 1: Artificial Intelligence and Robotics - Artificial intelligence aims to replicate human-like intelligence in machines, encompassing areas such as robotics, language recognition, and image recognition [12][19][22] - Key technologies in AI include machine learning, neural networks, and natural language processing, which are essential for developing intelligent systems [19][22] - The rise of swarm intelligence is noted, where collective behavior of multiple agents can lead to enhanced problem-solving capabilities in various applications [15][16] Group 2: Innovative Fields - Nine major innovative fields are identified, including human-machine interaction, biohybrids, and radical social innovation breakthroughs [8][89] - The article emphasizes the importance of interdisciplinary research in driving advancements in these fields, particularly in integrating AI with other technologies [8][89] Group 3: Emerging Technologies - Technologies such as hyperspectral imaging, speech recognition, and touchless gesture recognition are highlighted for their potential applications in various sectors [10][13][29] - The development of flying cars and autonomous vehicles is discussed, emphasizing the need for advancements in materials and battery technology to make these innovations feasible [32][33] Group 4: Material Innovations - Liquid metal technology is presented as a frontier material with applications in electronics and flexible devices, showcasing its unique properties [34][37] - The article also covers the advancements in high-temperature alloys and carbon fiber, which are crucial for aerospace and automotive industries [39][56] Group 5: Future Directions - The article suggests that the integration of AI with neuroscience could lead to breakthroughs in understanding human cognition and developing smarter systems [24][90] - It calls for continued investment in research and development to maintain competitiveness in the global market for AI and robotics technologies [86][88]
小哥硬核手搓AI桌宠!接入GPT-4o,听得懂人话还能互动,方案可复现
量子位· 2025-07-16 07:02
Core Viewpoint - The article discusses the creation of an AI pet named Shoggoth, inspired by the Pixar lamp robot, which utilizes GPT-4o and 3D printing technology to interact with humans in a pet-like manner [1][48]. Group 1: AI Pet Development - Shoggoth is designed to communicate and interact with users, potentially replacing traditional stuffed toys as childhood companions [5][52]. - The robot's structure is simple, featuring a base with three motors and a 3D-printed conical head, along with a flexible tentacle system inspired by octopus grabbing strategies [8][10]. - The robot can adapt to various object sizes and weights, capable of handling items up to 260 times its own weight [8]. Group 2: Control and Interaction Mechanisms - Shoggoth employs a dual-layer control system: low-level control using preset actions and high-level control utilizing GPT-4o for real-time processing of voice and visual events [25][26]. - The robot's perception includes hand tracking and tentacle tip tracking, using advanced models like YOLO for 3D triangulation [30][33]. - A 2D mapping system simplifies the control of tentacle movements, allowing users to manipulate the robot via a computer touchpad [22][24]. Group 3: Technical Challenges and Solutions - Initial designs faced issues with cable entanglement, which were addressed by adding a cable spool cover and calibration scripts to improve tension control [14][16][17]. - The design also required reinforcement of the "spine" structure to prevent sagging under its own weight [18]. - The final model successfully transitioned from simulation to real-world application, validating the effectiveness of the control strategies implemented [38]. Group 4: Creator Background - The creator, Matthieu Le Cauchois, is an ML engineer with a background in reinforcement learning, speech recognition, and NLP, having previously founded an AI company [39][41]. - His work includes various innovative projects, showcasing his expertise in machine learning and robotics [46][48].
开辟人形机器人赛道要警惕“虚火”
Group 1 - The humanoid robot industry is experiencing rapid growth, with a significant increase in the number of related companies and production output in China [4][5] - In the past year, 230,000 robot-related companies were established in China, marking a 22.7% year-on-year increase [4] - The production of industrial robots in China surged by 35.5% year-on-year, reaching 69,056 units, while service robots increased by 13.8% to 1.2 million units [4] Group 2 - Morgan Stanley forecasts that the Chinese robot market will grow at an average annual rate of 23%, expanding from $47 billion in 2024 to $108 billion by 2028 [5] - The humanoid robot market is expected to grow at a staggering 63% annually, from $300 million this year to $3.4 billion by 2030 [5] - By 2030, China is projected to have 252,000 humanoid robots, increasing to 302 million by 2050, accounting for 30% of the global total [5] Group 3 - The automotive industry is actively engaging in the humanoid robot sector, with various companies exploring opportunities in this emerging market [6][10] - Companies are cautious about the potential "bubble" in the humanoid robot market, drawing parallels to the previous hype surrounding electric vehicles [7][11] - The industry is advised to focus on practical applications and technological advancements rather than getting caught up in the hype [10][12] Group 4 - Challenges such as high costs, inefficiency, and complexity in humanoid robot technology remain significant barriers to commercialization [9] - Key technological advancements are needed in areas like lightweight materials, battery efficiency, and coordination of robotic limbs [9] - The industry must address issues related to data, computing power, and hardware-software integration to achieve long-term success [9] Group 5 - The Ministry of Industry and Information Technology has set goals for the humanoid robot industry, aiming for a robust innovation system and international competitiveness by 2027 [12] - The industry is encouraged to establish a reliable supply chain and expand application scenarios to integrate more deeply into the economy [12] - Companies are urged to remain rational and focus on technology development and application rather than solely pursuing humanoid robots [11][12]