腾讯研究院AI速递 20251028

Group 1: Tesla's World Simulator - Tesla has officially unveiled its neural network "World Simulator," capable of simulating a synthetic autonomous driving twin world, consuming 500 years of human driving experience daily for self-evolution [1] - The simulator employs an end-to-end neural network architecture, generating continuous footage at 24 frames per second from eight cameras, providing a realistic six-minute driving experience [1] - Through the "end-to-end" technology route, Tesla achieves direct output of steering angles and throttle/brake intensity from raw pixel input, eliminating information loss between modules and enabling learning of human values for complex road decision-making [1] Group 2: Meituan's LongCat-Video Model - Meituan has launched the LongCat-Video video generation model, based on the DiT architecture, supporting three core tasks: text-to-video, image-to-video, and video continuation [2] - The model can stably output five-minute long videos without quality loss, with a 720P five-second video generated in just 10 seconds, utilizing a three-tier optimization process [2] - LongCat-Video achieves state-of-the-art performance in text-to-video and image-to-video tasks, particularly excelling in long video generation suitable for digital humans and embodied intelligence [2] Group 3: MiniMax's M2 Model - MiniMax has released the M2 model, which is open-sourced and ranks fifth in the Artificial Analysis intelligence index, priced at only 1/12 of Claude 4.5 and 1/7 of GPT-5, making it the only domestic model in the top five [3] - The M2 scored 69.4 points in SWE-bench Verified and performed excellently in multiple tests, topping the global financial search benchmark with a score of 65.5 [3] - M2 supports integration with mainstream development tools like Claude Code and Cursor, offering a 14-day free API and Agent access, breaking the "intelligence level, speed, price" triangle with overwhelming cost-performance advantages [3] Group 4: Doubao Video Model - Volcano Engine has launched the Doubao video generation model Seedance 1.0 pro fast, achieving a speed increase of approximately three times, with a cost reduction of 72% [4] - The cost to generate a five-second 1080P video is only 1.03 yuan, allowing for the production of 9,709 videos with a budget of 10,000 yuan, with a performance improvement of 3.56 times compared to the pro version [4] - The model enhances core capabilities such as instruction adherence, seamless multi-shot storytelling, and detail expressiveness, showing significant advantages over global mainstream models like Veo 3.0 Fast in image-to-video generation [4] Group 5: Skywork AI's Web Cloning - Kunlun Wanwei's Skywork AI has introduced a web cloning feature, allowing users to generate fully functional web prototypes in minutes by providing a webpage link, uploading files, or entering text descriptions [5][6] - The system deeply analyzes the webpage's DOM structure, visual partitioning, and semantic relationships, achieving high fidelity in webpage reproduction across multiple dimensions [6] - It supports three creation methods: automatic generation from uploaded files, one-click cloning from provided URLs, and intelligent generation from pure text descriptions, significantly lowering the technical barriers for website creation [6] Group 6: xAI's AI Virtual Girlfriend - xAI, founded by Elon Musk, has introduced the AI virtual companion feature Grok Companions, with the first character Mika, designed as a green-haired anime-style character that engages users in flirty conversations [7] - Mika is positioned as an emotional product rather than a tool, raising concerns among parents and media due to its potential to unlock "adult tones" in certain modes, while also having a "child mode" that may be misactivated [7] - Currently, Grok features five AI companions, including Mika, Ani, Valentine, Good Rudi, and Bad Rudi, exploring the market potential of AI as emotional products rather than mere tools [7] Group 7: Sam Altman's Non-Invasive Brain-Computer Interface - OpenAI CEO Sam Altman has hired Caltech professor Mikhail Shapiro to join Merge Labs, a brain-computer interface startup valued at $8.5 billion, raising $250 million in funding [8] - Shapiro focuses on non-invasive neural imaging and control technology using ultrasound, opposing Neuralink's invasive approach, with aspirations to "control ChatGPT with thoughts" [8] - Shapiro has received several prestigious awards for his research, which aims to introduce genes into cells to respond to ultrasound, paving the way for less invasive brain-computer interfaces [8] Group 8: Work Hours in Silicon Valley AI Labs - The Wall Street Journal reports that top AI researchers and executives in Silicon Valley are working 80 to 100 hours a week, likened to a wartime state, achieving two years' worth of progress in just two years [9] - Researchers at Anthropic are seen working late into the night for inspiration, while DeepMind researchers have a "0-0-2" schedule, resting only two hours a week [9] - OpenAI has mandated a week of forced leave for all employees due to talent loss and burnout, while Meta's new superintelligence lab is offering over $100 million signing bonuses to attract OpenAI's core researchers, igniting a talent war [9] Group 9: DeepMind's DiscoRL Method - Google DeepMind has proposed the DiscoRL method, allowing multiple generations of agents to autonomously discover reinforcement learning (RL) rules through interaction in various environments, with the research published in Nature [10] - DiscoRL outperformed all existing rules in Atari benchmark tests, achieving an IQM of 13.86, and also excelled in previously unencountered benchmarks like ProcGen, Crafter, and NetHack [10] - The research indicates that RL performance is dependent on data (environment) and computational resources, suggesting that future advanced AI RL algorithms may be discovered autonomously rather than designed by humans [11]