腾讯研究院AI速递 20260115
腾讯研究院·2026-01-14 16:03

Group 1: US Export Control Regulations - The US Department of Commerce's Bureau of Industry and Security has relaxed export control regulations for high-performance chips, allowing for the export of Nvidia's H200 and AMD's MI325X to China under specific conditions [1] - The new regulations require applicants to demonstrate sufficient supply in the US market and that exports do not exceed 50% of total US sales, with projections indicating that the H200 could generate over $47.6 billion in revenue for Nvidia by 2026, including nearly $16 billion from the Chinese market [1] - Concurrently, the US House of Representatives passed the Remote Access Security Act, which may impact overseas data center projects by restricting access to advanced computing power for AI model training [1] Group 2: Google Veo 3.1 Upgrade - Google Veo 3.1 has been upgraded to support "material-based video" generation, allowing users to create high-quality videos by uploading images and text instructions, achieving unprecedented consistency in character representation [2] - The new version supports native 9:16 vertical output and industry-leading 1080p and 4K ultra-resolution technology, eliminating the need for post-editing and quality loss, making it suitable for platforms like YouTube Shorts [2] - This functionality has been introduced in YouTube Shorts and YouTube Create applications, with enhanced versions being pushed to Flow, Gemini API, Vertex AI, and Google Vids [2] Group 3: Zhiyuan and Huawei Collaboration - Zhiyuan has partnered with Huawei to open-source a new generation image generation model, GLM-Image, which is the first SOTA multimodal model trained on domestic chips [3] - The model employs an innovative "autoregressive + diffusion decoder" hybrid architecture, achieving first place in open-source rankings on CVTG-2K and LongText-Bench, with a Chinese text rendering score of 0.979 [3] - API calls for generating an image cost only 0.1 yuan, excelling in knowledge-intensive scenarios such as posters, PPTs, and Chinese character generation, and is available on GitHub and Hugging Face [3] Group 4: PixVerse R1 Release - Aishi Technology has released PixVerse R1, the world's first real-time world model capable of generating video at a maximum resolution of 1080P, allowing users to intervene in the video generation process in real-time [4] - The model is based on an Omni native multimodal foundational model, an autoregressive streaming generation mechanism, and an instant response engine, transforming video generation from "fixed segments" to "infinite visual streams" [4] - It defines a new form of "Playable Reality," making videos a continuously existing process that can be intervened in real-time, currently in beta testing with a selective invitation mechanism [4] Group 5: Vidu's One-Click MV Generation - Vidu AI has launched a "one-click MV" feature, enabling users to submit music, reference images, and text instructions for automatic output of a coherent, high-quality music video [6] - The system incorporates a deep collaborative multi-agent framework, including director, storyboard, visual generation, and editing agents, producing complete videos within minutes [6] - The "multi-image reference video generation" technology allows users to upload up to seven reference images, accurately replicating character features and aesthetic styles in videos up to five minutes long, achieving frame-level audio-visual integration [6] Group 6: 1X Company's NEO Robot - 1X Company has introduced a new "brain" for its home humanoid robot NEO, which learns the laws of physical world operation by watching vast amounts of online videos and human first-person operation recordings [7] - The model is based on a 14 billion parameter generative video model, employing a multi-stage training strategy that includes 900 hours of human first-person mid-training and 70 hours of embodied fine-tuning, generating successful task completion videos before executing actions [7] - The inverse dynamics model (IDM) is trained on 400 hours of unfiltered robot data, extracting corresponding action trajectories from generated videos, with official tweets surpassing 5 million views [7] Group 7: League of Legends Mysterious Player - A mysterious player in the Korean server achieved a 95% win rate, completing 56 matches in just 51 hours, with a record of 52 wins and 4 losses, rising from below Diamond to the top ranks [8] - This account used 22 different heroes in ranked matches, with a lane win rate of 86%, significantly outperforming the top ten players in the Korean server, sparking discussions about the player's identity possibly being linked to Elon Musk's AI [8] - Following T1's global championship win in 2025, Musk's challenge to top teams has led to speculation, with the true identity of the account remaining a mystery [8] Group 8: Google MedGemma 1.5 Release - Google Research has released MedGemma 1.5, which supports high-dimensional medical image analysis, including CT and MRI three-dimensional data and whole-slide digital pathology images [9] - The accuracy of disease classification in MRI has improved from 51% to 65%, with anatomical structure localization accuracy rising from 3% to 38%, and MedQA accuracy increasing from 64% to 69% [9] - The MedASR speech recognition model has been launched, achieving a word error rate of only 5.2% in chest X-ray report dictation scenarios, outperforming the general model Whisper by 82%, and is now available on Hugging Face and Vertex AI [9] Group 9: Google Cloud AI Director's Insights - The director of Google Cloud AI, Addy Osmani, raised five critical questions regarding the future of software engineering in the AI era, including the necessity of junior engineers and the relevance of computer science degrees [10][11] - A Harvard study indicated that the introduction of generative AI led to a 9%-10% decline in junior developer positions over six quarters, while senior engineer employment remained stable, with major tech companies reducing entry-level hiring by 50% [11] - Recommendations for junior engineers include building AI-integrated portfolios and manually coding key algorithms, while senior engineers should focus on architecture reviews to adapt to an "agent-based" engineering environment [11]