Workflow
可灵数字人
icon
Search documents
腾讯研究院AI速递 20250916
腾讯研究院· 2025-09-15 16:01
Group 1: Google Gemini and AI Tools - Google Gemini has topped the App Store free chart, surpassing ChatGPT, due to its popular Nano Banana image editing feature [1] - Gemini is a comprehensive AI toolkit that includes Canvas, Veo3 video generation, Storybook, and Deep Research among other functionalities [1] - The Google AI suite also features NotebookLM knowledge base (allowing up to 300 file uploads), Flow video generation (supporting 1080p HD), AI Mode search, and Gemini CLI local assistant [1] Group 2: xAI's Grok 4 Fast Model - xAI has launched the Grok 4 Fast model, achieving a generation speed of 75 tokens per second, which is ten times faster than the standard version [2] - User tests indicate that the new model excels in programming and middle school math tasks, solving LeetCode problems in under 2 seconds [2] - Despite its speed advantage, Grok 4 Fast compromises on accuracy, making it suitable for simple queries or tool usage, reflecting xAI's recent focus on speed [2] Group 3: Keling AI's Digital Human - Keling AI has introduced an upgraded digital human feature that supports up to 60 seconds of output at 1080P/48fps, significantly enhancing facial recognition and lip-sync accuracy [3] - The new feature allows for prompt-based control of character emotions and actions, enabling digital humans to display richer expressions and body language [3] - Keling's digital human service is priced at 0.12 yuan per second at 720P, approximately one-third the cost of similar products from Heygen, nearing the industry's lowest price [3] Group 4: Tencent's AI Painting Upgrade - Tencent's Mix Yuan has proposed a new method to optimize AI painting, improving diffusion model training through Direct-Align and Semantic Relative Preference Optimization (SRPO) techniques [4] - Direct-Align optimizes the entire diffusion trajectory, addressing the "reward hacking" issue seen in traditional methods that only optimize later stages [4] - The FLUX1.dev model trained with SRPO has seen a threefold increase in realism and aesthetic scores, requiring only 32 H20 blocks for 10 minutes of training [4] Group 5: Albania's AI Minister - Albania has become the first country to appoint an "AI Minister," named Diella, which will oversee public procurement projects [5] - Diella aims to serve as a benchmark for government transparency reforms, responsible for evaluating tenders and selecting personnel to achieve 100% integrity in public bidding [5] - This initiative seeks to address long-standing issues of corruption in public procurement in Albania while promoting the country's digital government transformation [5] Group 6: xAI's Workforce Changes - xAI has reportedly laid off about 500 employees from its data labeling team, accounting for one-third of that team, with affected employees receiving salary payments until the end of November [6] - The company announced a strategic shift to reduce general AI mentors while expanding the professional AI mentor team by tenfold, focusing on recruiting talent from STEM, finance, and medicine [7] - Prior to the layoffs, xAI required employees to participate in tests determining their job security, leading to concerns about the fairness of the process among some employees [7] Group 7: UCLA's Energy-Efficient Imaging - A research team from UCLA has published a paper in Nature on a nearly zero-energy optical image generation model, with Shiqi Chen, a Zhejiang University alumnus, as the first author [8] - The system generates static noise using digital encoders, imprinting noise patterns onto laser beams via spatial light modulators, and then converting the noise into images with a second device [8] - This system can produce images of handwritten digits, fashion items, and Van Gogh-style artworks, making it suitable for VR, AR displays, and wearable devices due to its ultra-fast and low-energy characteristics [8] Group 8: AI Programming Challenges - A senior developer, Carla Rover, experienced significant issues with "vibe coding," leading to a project overhaul and emotional distress [9] - A report from Fastly indicates that 95% of developers require additional time to fix AI-generated code, leading to the emergence of "vibe coding cleanup specialists" with salaries reaching $100,000 [9] - Many experienced developers express that AI programming resembles "caring for a 6-year-old," lacking systematic thinking and often introducing security vulnerabilities, with 50% of their time spent on requirements and 30-40% on fixing AI code [9] Group 9: Anthropic's AI Economic Index - Anthropic has released its first comprehensive AI economic index report, revealing that the proportion of users assigning complete tasks to Claude has increased from 27% to 39% [10] - The report highlights a close correlation between AI usage and regional economic characteristics, with Washington D.C. and Utah showing the highest per capita usage, while Hawaii focuses on travel planning and Massachusetts on scientific research [10] - Data indicates that regions with higher GDP exhibit greater AI usage rates, with wealthier countries showcasing more diverse use cases, while enterprise users have an automation rate of 77%, significantly higher than that of individual users [10]
从「对口型」到「会表演」,刚进化的可灵AI数字人,技术公开了
机器之心· 2025-09-15 12:19
Core Viewpoint - The article discusses the advancements made by Kuaishou's Keling team in creating a new digital human generation paradigm, specifically through the Kling-Avatar project, which allows for expressive and natural performances in long videos, moving beyond simple lip-syncing to full-body expressions and emotional engagement [2][31]. Group 1: Technology and Framework - The Kling-Avatar utilizes a two-stage generative framework powered by a multimodal large language model, enabling the transformation of audio, visual, and textual inputs into coherent storylines for video generation [6][10]. - A multimodal director module organizes inputs into a structured narrative, extracting voice content and emotional trajectories from audio, identifying human features and scene elements from images, and integrating user text prompts into actions and emotional expressions [8][10]. - The system generates a blueprint video that outlines the overall rhythm, style, and key expression nodes, which is then used to create high-quality sub-segment videos [12][28]. Group 2: Data and Training - The Keling team collected thousands of hours of high-quality video data from various sources, including speeches and dialogues, to train multiple expert models for assessing video quality across several dimensions [14]. - A benchmark consisting of 375 reference image-audio-text prompt pairs was created to evaluate the effectiveness of the digital human video generation methods, providing a challenging testing scenario for multimodal instruction following [14][23]. Group 3: Performance and Results - The Kling-Avatar demonstrated superior performance in a comparative evaluation against advanced products like OmniHuman-1 and HeyGen, achieving higher scores in overall effectiveness, lip sync accuracy, visual quality, control response, and identity consistency [16][24]. - The generated lip movements were highly synchronized with audio, and facial expressions adapted naturally to vocal variations, even during complex phonetic sounds [25][26]. - Kling-Avatar's ability to generate long videos efficiently was highlighted, as it can produce multiple segments in parallel from a single blueprint video, maintaining quality and coherence throughout [28]. Group 4: Future Directions - The Keling team aims to continue exploring advancements in high-resolution video generation, fine-tuned motion control, and complex multi-turn instruction understanding, striving to imbue digital humans with a genuine and captivating presence [31].