SIASUN-人形机器人实现多语言逼真唇形动作

Core Insights - A new framework developed by scientists at Columbia University enables humanoid robots to generate realistic lip movements synchronized with audio, enhancing human-like communication capabilities [1][3] - The technology demonstrates strong generalization abilities, applicable to multiple languages including French, Chinese, and Arabic, even those not present in the training data [1][3] Group 1: Technology Development - The research team previously published a study in 2024 describing a humanoid robot's ability to predict and replicate human smiles, laying the groundwork for more precise lip and voice matching [3] - A learning process was designed, starting with the collection of visual data on robot lip movements to train models and generate motion reference points [3] - A "facial action converter" module was developed to produce motion commands, allowing the robot's lips to fluidly match different words [3] Group 2: Mechanical Design - The humanoid robot features a specially designed face structure with soft silicone skin and magnetic connectors, providing 10 degrees of freedom for complex lip movements [3] - The lip structure can form various mouth shapes covering 24 consonants and 16 vowels, enhancing the robot's ability to produce natural speech [3] Group 3: Validation and Performance - The team utilized ChatGPT to generate test phrases and synthesized videos with ideal lip movements for comparison [3] - Results indicated that the new method outperformed five comparison schemes, with the generated lip movements showing minimal differences from the ideal video [3] - The framework can also generate natural lip synchronization effects for 11 different non-English language phonetic structures [3] Group 4: Potential Applications - The research team speculates that humanoid robots could have applications in education and elderly care, highlighting the technology's potential impact on these sectors [4] - The development raises considerations about the ethical implications and potential misuse of such technology in the future [4]