Workflow
Veo 3 Fast
icon
Search documents
腾讯研究院AI速递 20250918
腾讯研究院· 2025-09-17 16:01
Group 1 - Li Feifei's company World Labs launched the spatial intelligence model Marble, capable of generating large-scale 3D worlds from a single image or text prompt [1] - Marble offers larger scale, more diverse styles, and cleaner geometric structures compared to previous products, supporting free navigation in browsers [1] - Users can export generated worlds as Gaussian point clouds for efficient operation on desktop, mobile devices, and VR headsets, with whitelist testing now open [1] Group 2 - Google partnered with over 60 institutions, including American Express and PayPal, to introduce the AI Payment Protocol (AP2) aimed at creating a secure standard for AI agent payments [2] - AP2 builds trust through "Mandates," using encrypted digital contracts as proof of user instructions, allowing pre-authorization for AI agents to make purchases under specific conditions [2] - The protocol supports real-time purchases and automated tasks without human involvement, with an encrypted version A2A x402 enabling stablecoin payments, and a GitHub repository is available for developers [2] Group 3 - Anthropic plans to invest $10 billion to create enterprise application clones, while OpenAI expects to spend $8 billion on data-related costs by 2030 [3] - Both companies are training AI models to operate various professional software using a "reinforcement learning environment" that simulates enterprise applications [3] - They may hire domain experts to demonstrate task execution, aiming to develop AI as "virtual colleagues" and open new revenue streams [3] Group 4 - Tencent Cloud announced the global launch of its upgraded Intelligent Agent Development Platform 3.0 (ADP3.0), which has seen nearly 600 features launched in the past three months [4] - The platform upgrade includes enhanced knowledge base management, multi-agent collaboration support, global agent visibility in workflows, and instant command capabilities [4] - Targeted industry agents for smart quality inspection and media content processing have been introduced, with Youtu-Agent framework and Youtu-GraphRAG knowledge graph framework set to be open-sourced [4] Group 5 - Disney, Warner Bros., and Universal Pictures filed a lawsuit against Chinese AI company MiniMax, accusing it of unauthorized use of IPs like Spider-Man for AI training [5] - The companies seek restitution for infringement profits and damages of up to $150,000 per infringement, along with a permanent injunction to prevent MiniMax from using related IPs [5] - MiniMax previously faced similar accusations from iQIYI regarding the drama "Canglan Jue," highlighting significant risks in IP imitation within AIGC [6] Group 6 - The AI tool ima has been updated to support audio file uploads in formats like MP3, M4A, WAV, and AAC, enabling automatic generation of transcripts, summaries, and notes [7] - The update includes a screenshot shortcut feature for desktop users, allowing direct questioning, knowledge base addition, or note-taking after capturing images [7] - Mobile note-taking now supports offline editing and creation, with automatic synchronization once reconnected to the internet [7] Group 7 - YouTube introduced a generative AI tool for Shorts creators, incorporating a customized version of Google's text-to-video model Veo 3, enabling low-latency content generation at 480p resolution [8] - The new version allows for sound addition and dynamic effects application to static images [8] - YouTube also launched a "voice-to-song" remix tool based on Google's Lyria 2 and an "AI editing" feature that automatically organizes highlights, adds music, and transitions [8] Group 8 - Figure, a humanoid robotics company, completed a Series C funding round, raising over $1 billion and achieving a post-money valuation of $39 billion, the highest in the embodied intelligence sector [9] - The funding round was led by Parkway Venture Capital, with participation from Nvidia and Intel Capital, aimed at expanding production capacity and building GPU infrastructure [9] - Figure has rapidly progressed since parting ways with OpenAI, launching the Helix end-to-end "vision-language-action" model, with robots capable of complex tasks like folding clothes and sorting packages [9] Group 9 - Huawei released two research reports, "Intelligent World 2035" and "Global Digital Intelligence Index 2025," forecasting key technological trends and their industry impacts over the next decade [10] - The reports predict ten major trends, including AGI as a transformative force, AI agents evolving from execution tools to decision-making partners, and human-machine collaborative programming becoming mainstream [10] - It is anticipated that by 2035, total computing power will increase by 100,000 times, AI storage capacity demand will grow by 500 times compared to 2025, and renewable energy generation will exceed 50% [10] Group 10 - Shopify shared insights on the evolution of its AI assistant Sidekick, recommending a simple architecture, clear tool boundaries, and a modular design approach [11] - The company suggested replacing "golden datasets" with "benchmark truth sets" that reflect real production environments, aligning large language model evaluations with human assessments [11] - Shopify warned about "reward hacking" issues and advised establishing detection mechanisms in advance, combining programmatic validation with semantic evaluation to create a multi-layer reward system [11]
谷歌Veo 3已支持生成1080P分辨率与竖屏视频,且费用大降;腾讯混元图像模型2.1上新开源丨AIGC日报
创业邦· 2025-09-11 00:08
Group 1 - Microsoft will integrate Anthropic AI technology into Office 365, ending its exclusive reliance on OpenAI for new features in applications like Word, Excel, Outlook, and PowerPoint [2] - OpenAI is also working to reduce its dependence on Microsoft by launching a recruitment platform to compete with LinkedIn [2] - The UAE has introduced a low-cost AI inference model, K2 Think, which reportedly outperforms larger models with only 32 billion parameters, based on Alibaba's open-source Qwen 2.5 model [2] Group 2 - Google has updated its Veo 3 AI video generation tool to support 1080P resolution and vertical video formats, making it more suitable for mobile devices and social media [2] - Tencent has open-sourced its mixed Yuan image model 2.1, which supports native 2K images and bilingual input, enhancing the model's ability to generate complex prompts and accurate representations [4]
X @Demis Hassabis
Demis Hassabis· 2025-09-02 00:21
More relentless 🚢 !Philipp Schmid (@_philschmid):August at Google DeepMind was like 🧞‍♂️ 🖼️ 🍌 🚀 🔍 🤏🏻- Nano Banana (Gemini 2.5 Flash Image)- Gemini Embedding- Veo 3 Fast- Genie 3- Imagen 4 Fast- Gemma 3 270M- Perch 2- Kaggle Game Arena- Gemini API Url Context- AI Studio Builder (UI Rework, Prompt Suggestions, GitHub https://t.co/iaRgtVp3OZ ...
X @Demis Hassabis
Demis Hassabis· 2025-07-31 23:21
Product Updates - Veo 3 Fast and Veo 3 image-to-video are now available in the API [1] - Veo 3 Fast is priced at $0.40 per second of video with audio [1] Pricing and Availability - Veo 3 Fast includes production-ready rate limits [1] - Veo 3 Fast offers comparable quality in certain cases [1]
实测Gemini图片转视频新功能,终于蹲到经典梗图后续了(doge)
量子位· 2025-07-12 04:57
Core Viewpoint - The article discusses the new feature of Gemini that allows users to convert images into videos with sound, showcasing its capabilities and performance through various tests and examples [54]. Group 1 - Gemini has integrated the Veo 3 Fast technology, enabling video generation of approximately 7-8 seconds in length, with a generation speed of about 1-2 minutes [54]. - Users can generate videos three times a day under the Google AI Pro membership, with retries also counting against this limit [54]. - The sound effects produced by Gemini are noted to be impressive, although more specific descriptions are needed for better accuracy in sound generation [55]. Group 2 - The article highlights various tests conducted with the new feature, including opening different types of boxes and the resulting animations, which often include humorous or unexpected elements [5][20][24]. - The performance ratings for generated videos vary, with some achieving high scores in speed and fun, while others have lower ratings for visual effects [17][22][26]. - There are limitations noted, such as the inability to generate specific human likenesses and the need for detailed prompts to achieve desired outcomes [56][57].