Workflow
Blender Fusion框架
icon
Search documents
腾讯研究院AI速递 20250704
腾讯研究院· 2025-07-03 15:31
Group 1 - Google, Nvidia, and seven other institutions have launched the world's first AI-native UGC game engine, Mirage, which can generate game content in real-time through natural language commands [1] - Mirage supports a smooth experience at 16 FPS, allowing for 5-10 minutes of continuous gameplay, with graphics quality comparable to GTA and Forza [1] - The core technology is based on a "world model" created using Transformer and diffusion models, trained on extensive gaming data to enable dynamic interaction and real-time control [1] Group 2 - Zhiyuan Research Institute has released OmniGen2, a unified image generation model that supports text-to-image, image editing, and theme-driven image generation [2] - The model introduces an innovative image generation reflection mechanism, significantly enhancing context understanding, instruction adherence, and image generation quality [2] - OmniGen2 has an open research experience version, with model weights, training code, and training data fully open-sourced, achieving over 2000 stars on GitHub within a week [2] Group 3 - Google has announced the free provision of the Gemini AI tool suite to global educators, deeply integrated into Google Classroom and ChromeOS [3] - Gemini in Classroom includes over 30 AI tools that can automatically generate lesson plans, classroom activities, and quiz questions, saving teachers preparation time [3] - New AI tools like NotebookLM and Gems, along with data analysis features, aim to create personalized learning experiences and data-driven teaching [3] Group 4 - Xingliu Agent is a multifunctional AI creation platform that can complete various creative tasks such as batch emoji generation, brand VI design, video generation, and 3D modeling through natural language commands [4][5] - Key features include high-quality content generation in bulk, Kontext intelligent image editing, and full media workflow support, establishing a new design paradigm of "Vibe designing" [5] - The platform offers free experience credits and supports diverse creative outputs, shifting the designer's role from "mastering technology" to "understanding needs and expressing creativity" [5] Group 5 - Tencent Yuanbao has introduced a new feature that supports AI-based image and video content search, allowing intelligent matching of content without restrictions on model usage [6] - The results can intelligently reference related video tutorials, facilitating a combination of text and video explanations, with one-click access to watch the videos [6] - Users can continue to ask follow-up questions after receiving initial answers, enhancing the interactive experience [6] Group 6 - The Xie Saineng team has released the Blender Fusion framework, enabling precise control of 3D scenes without relying on text prompts [7] - The core technology involves a three-step process: separating objects and scenes using the SAM model, editing in Blender, and generating high-quality composite images with a diffusion model [7] - The system employs a dual-stream diffusion synthesizer to enhance generalization and realism through techniques like source occlusion and simulated object jitter [7] Group 7 - xAI is set to release the new Grok 4 series, including the flagship Grok 4 and the specialized programming model Grok 4 Code, with a launch expected after the U.S. National Day [8] - Grok 4 features a context window of 130,000 tokens, supports function calls, structured outputs, and reasoning capabilities, but currently lacks visual and image generation functions [8] - Elon Musk aims for Grok 4 to rewrite the human knowledge base, filling in missing information and correcting errors, while Grok 4 Code will serve as a professional programming assistant [8] Group 8 - The U.S. Department of Commerce has lifted temporary bans on the three major EDA companies, Siemens, Synopsys, and Cadence, allowing full access to their software and technology for Chinese customers [11] - Previously, a sudden export restriction led to a significant drop in stock prices, with Synopsys predicting a 28% year-on-year decline in revenue from the China region [11] - The domestic EDA industry faces challenges regarding maturity and market share, as chip design companies prefer using more mature foreign products to ensure successful tape-out [11] Group 9 - The World Economic Forum's "2025 Global Future of Jobs Report" indicates that AI and machine learning specialists will be the fastest-growing occupations, with an expected growth of 86% in job numbers [12] - AI is set to reshape the global labor market, with data analytics, cybersecurity, and technical literacy emerging as the three fastest-growing skills, while traditional roles like data entry clerks and administrative assistants face declining demand [12] - Approximately 39% of employees' skills are expected to change significantly between 2025 and 2030, yet only 50% of employees have received systematic training, with 63% of employers viewing skill gaps as the biggest obstacle to business transformation [12]
谢赛宁团队新作:不用提示词精准实现3D画面控制
量子位· 2025-07-03 04:26
Core Viewpoint - The article discusses the innovative Blender Fusion framework developed by the Sesein team, which combines graphic tools (Blender) with diffusion models to enable precise control and flexible manipulation of visual compositions, moving beyond traditional text prompts [6][9]. Group 1: Blender Fusion Framework - Blender Fusion allows users to control the positioning, rotation, and scaling of objects in generated images using keyboard or mouse inputs [2][4]. - The framework operates through a new pipeline that includes three main steps: object and scene separation, 3D editing in Blender, and high-quality image generation using diffusion models [10][9]. Group 2: Step-by-Step Process - The first step involves object-centric layering, where objects are separated from the original scene, and their 3D information is inferred using existing visual models like Segment Anything Model (SAM) and Depth Pro [13][14]. - The second step is Blender-grounded editing, allowing for detailed editing of the separated objects and camera controls within Blender [18]. - The final step is generative compositing, where a dual-stream diffusion compositor enhances the visual quality of the rendered scene while maintaining global consistency [23][22]. Group 3: Techniques and Results - Two important training techniques are introduced: source masking, which helps the model learn to restore complete images based on conditional information, and simulated object jittering, which improves the model's ability to decouple camera and object movements [24]. - Blender Fusion demonstrates effective visual generation capabilities, maintaining spatial relationships and visual coherence in complex scene edits, including single-image processing and multi-image scene reorganization [25][29]. Group 4: User Experience and Implications - The framework provides creators with greater freedom and control, allowing them to manipulate visual elements without being constrained by text prompts [33]. - The process from object layering to high-fidelity generation makes AI image synthesis more intuitive and flexible, akin to building with blocks [35].