Blender Fusion框架

Search documents
腾讯研究院AI速递 20250704
腾讯研究院· 2025-07-03 15:31
Group 1 - Google, Nvidia, and seven other institutions have launched the world's first AI-native UGC game engine, Mirage, which can generate game content in real-time through natural language commands [1] - Mirage supports a smooth experience at 16 FPS, allowing for 5-10 minutes of continuous gameplay, with graphics quality comparable to GTA and Forza [1] - The core technology is based on a "world model" created using Transformer and diffusion models, trained on extensive gaming data to enable dynamic interaction and real-time control [1] Group 2 - Zhiyuan Research Institute has released OmniGen2, a unified image generation model that supports text-to-image, image editing, and theme-driven image generation [2] - The model introduces an innovative image generation reflection mechanism, significantly enhancing context understanding, instruction adherence, and image generation quality [2] - OmniGen2 has an open research experience version, with model weights, training code, and training data fully open-sourced, achieving over 2000 stars on GitHub within a week [2] Group 3 - Google has announced the free provision of the Gemini AI tool suite to global educators, deeply integrated into Google Classroom and ChromeOS [3] - Gemini in Classroom includes over 30 AI tools that can automatically generate lesson plans, classroom activities, and quiz questions, saving teachers preparation time [3] - New AI tools like NotebookLM and Gems, along with data analysis features, aim to create personalized learning experiences and data-driven teaching [3] Group 4 - Xingliu Agent is a multifunctional AI creation platform that can complete various creative tasks such as batch emoji generation, brand VI design, video generation, and 3D modeling through natural language commands [4][5] - Key features include high-quality content generation in bulk, Kontext intelligent image editing, and full media workflow support, establishing a new design paradigm of "Vibe designing" [5] - The platform offers free experience credits and supports diverse creative outputs, shifting the designer's role from "mastering technology" to "understanding needs and expressing creativity" [5] Group 5 - Tencent Yuanbao has introduced a new feature that supports AI-based image and video content search, allowing intelligent matching of content without restrictions on model usage [6] - The results can intelligently reference related video tutorials, facilitating a combination of text and video explanations, with one-click access to watch the videos [6] - Users can continue to ask follow-up questions after receiving initial answers, enhancing the interactive experience [6] Group 6 - The Xie Saineng team has released the Blender Fusion framework, enabling precise control of 3D scenes without relying on text prompts [7] - The core technology involves a three-step process: separating objects and scenes using the SAM model, editing in Blender, and generating high-quality composite images with a diffusion model [7] - The system employs a dual-stream diffusion synthesizer to enhance generalization and realism through techniques like source occlusion and simulated object jitter [7] Group 7 - xAI is set to release the new Grok 4 series, including the flagship Grok 4 and the specialized programming model Grok 4 Code, with a launch expected after the U.S. National Day [8] - Grok 4 features a context window of 130,000 tokens, supports function calls, structured outputs, and reasoning capabilities, but currently lacks visual and image generation functions [8] - Elon Musk aims for Grok 4 to rewrite the human knowledge base, filling in missing information and correcting errors, while Grok 4 Code will serve as a professional programming assistant [8] Group 8 - The U.S. Department of Commerce has lifted temporary bans on the three major EDA companies, Siemens, Synopsys, and Cadence, allowing full access to their software and technology for Chinese customers [11] - Previously, a sudden export restriction led to a significant drop in stock prices, with Synopsys predicting a 28% year-on-year decline in revenue from the China region [11] - The domestic EDA industry faces challenges regarding maturity and market share, as chip design companies prefer using more mature foreign products to ensure successful tape-out [11] Group 9 - The World Economic Forum's "2025 Global Future of Jobs Report" indicates that AI and machine learning specialists will be the fastest-growing occupations, with an expected growth of 86% in job numbers [12] - AI is set to reshape the global labor market, with data analytics, cybersecurity, and technical literacy emerging as the three fastest-growing skills, while traditional roles like data entry clerks and administrative assistants face declining demand [12] - Approximately 39% of employees' skills are expected to change significantly between 2025 and 2030, yet only 50% of employees have received systematic training, with 63% of employers viewing skill gaps as the biggest obstacle to business transformation [12]
谢赛宁团队新作:不用提示词精准实现3D画面控制
量子位· 2025-07-03 04:26
henry 发自 凹非寺 量子位 | 公众号 QbitAI 曾几何时,用文字生成图像已经变得像用笔作画一样稀松平常。 但你有没有想过拖动方向键来控制画面? 像这样,拖动方向键(或用鼠标拖动滑块)让画面里的物体左右移动: 还能旋转角度: 缩放大小: 这一神奇操作就来自于谢赛宁团队新发布的 Blender Fusion框架, 通过结合图形工具 (Blender) 与扩散模型,让视觉合成不再仅仅依赖 文本提示,实现了精准的画面控制与灵活操作。 图像合成三步走 BlenderFusion "按键生图" 的 核心并不在于模型自身的创新,而在于其对现有技术(分割、深度估计、Blender渲染、扩散模型)的高效 组合 ,打通了一套新的Pipeline 。 这套Pipeline包含三个步骤: 先将物体和场景分离 → 再用Blender做3D编辑 → 最后用扩散模型生成高质量合成图像。 接下来看看每一步都是怎么做的吧! 第一步: 以物体为中心的分层。(Object-centric Layering) 第一步是将输入的图像或视频中的各个物体从原有的场景中分离,并推断出它们的三维信息。 具体来说,BlenderFusion利用现有 ...