Workflow
LightLab
icon
Search documents
腾讯研究院AI每周关键词Top50
腾讯研究院· 2025-05-23 09:10
AI前沿每周关键词Top50 ( 0519-0523) 事件 正式收购io OpenAI 扫码加入ima知识库 ( 腾讯研究院ima AGI知识库二维码) 推 荐 阅 读 闫德利 : 《 技术创新的性质》 点个 "在看" 分享洞见 每周50关键词 把握全局AI动态 点击 关键词 可查看资讯概述 | 类别 | Top关键词 | 主体 | | --- | --- | --- | | 算力 | 阿布扎比数据中心 | OpenAI | | 算力 | GB300等 | NVIDIA | | 算力 | CloudMatrix 384等 | 华为 | | 算力 | TPU应用 | Google | | 模型 | SWE-1模型 | Windsurf | | 模型 | BGE向量模型 | 智源研究院 | | 模型 | 模型矩阵更新 | 腾讯 | | 模型 | Gemini Diffusion | 谷歌 | | 模型 | Devstral | Mistral | | 应用 | Codex | OpenAI | | 应用 | 混元图像2.0 | 腾讯 | | 应用 | 新增生图功能 | Manus | | 应用 | LightL ...
腾讯研究院AI速递 20250519
腾讯研究院· 2025-05-18 14:33
Group 1: OpenAI and AI Programming Tools - OpenAI launched a new AI programming tool Codex, powered by the codex-1 model, which generates clearer code and automatically iterates testing until successful [1] - Codex operates in a cloud sandbox environment, capable of handling multiple programming tasks simultaneously, and supports integration with GitHub for preloading code repositories [1] - The tool is currently available to paid users of ChatGPT Pro, with plans for rate limiting and options to purchase additional credits for more usage [1] Group 2: Image Generation Technologies - Tencent's Mix Yuan Image 2.0 achieves millisecond-level image generation, allowing users to see real-time changes as they input prompts, breaking the traditional 5-10 second generation time limit [2] - The new model supports both text-to-image and image-to-image functionalities, with adjustable reference strength for the image generation process [2] - Manus introduced an image generation feature that understands user intent and plans solutions, providing a one-stop service from brand design to website deployment, although complex tasks may take several minutes to complete [3] Group 3: Google and LightLab Project - Google launched the LightLab project, enabling precise control over light and shadow in images through diffusion models, allowing adjustments to light intensity and color [4][5] - The research team built a training dataset by combining real photo pairs with synthetic rendered images, achieving superior PSNR and SSIM metrics compared to existing methods [5] Group 4: Supermemory API - Supermemory released the Infinite Chat API, acting as a transparent proxy between applications and LLMs, maintaining dialogue context to overcome the 20,000 token limit of large models [6] - The API utilizes RAG technology to manage overflow context, claiming to save 90% of token consumption, and can be integrated into existing applications with just one line of code [6] - Pricing includes a fixed monthly fee of $20, with the first 20,000 tokens of each conversation free, and $1 per million tokens for any excess [6] Group 5: Grok AI Controversy - Grok AI assistant faced backlash for inserting controversial content related to "white genocide" in responses, attributed to unauthorized modifications of system prompts by an employee [7] - xAI publicly released Grok's prompts on GitHub and committed to enhancing review mechanisms and forming a monitoring team [7] - The incident highlighted security vulnerabilities in AI systems that heavily rely on prompts, with research indicating that mainstream models can be compromised through specific prompting techniques [7] Group 6: Windsurf and SWE-1 Model - Windsurf launched the SWE-1 model, focusing on optimizing the entire software engineering process rather than just coding functions, marking its first product release after being acquired by OpenAI for $3 billion [8] - SWE-1 performs comparably to models like GPT-4.1 in programming benchmarks but lags behind Claude 3.7 Sonnet, with a commitment to lower service costs than Claude 3.5 Sonnet [8] Group 7: Google TPU vs. OpenAI GPU - Google TPU offers AI cost efficiency at one-fifth the price of OpenAI's NVIDIA GPUs while maintaining comparable performance [10] - Google's API service Gemini 2.5 Pro is priced 4-8 times lower than OpenAI's o3 model, reflecting different market strategies [10] - Apple's decision to use Google TPU for training its AFM model may influence other companies to explore alternatives to NVIDIA GPUs [10] Group 8: Lovart's Design Philosophy - Lovart's founder emphasizes a three-stage evolution of AI image products, from single content generation to workflow tools, and now to AI-driven agents [11] - The design philosophy focuses on restoring the original essence of design, facilitating natural interaction between AI and users [11] - Lovart believes that general product managers will be replaced by designers with specialized knowledge, stating, "we have no product managers, only designers" [11] Group 9: Lilian Weng's Insights on Model Thinking - Lilian Weng discusses the importance of "thinking time" in large models, suggesting that increasing computational time during testing can enhance performance on complex tasks [12] - Current model thinking strategies include parallel sampling and sequential revision, requiring a balance between thinking time and computational costs [12] - Research indicates that optimizing thinking chains through reinforcement learning may lead to reward hacking issues, necessitating further investigation [12]
一键开关灯!谷歌用扩散模型,将电影级光影控制玩到极致
机器之心· 2025-05-16 04:39
机器之心报道 编辑:刘欣、+0 最近,Google 推出了一个可以 精准控制画面中光影的项目 —— LightLab 。 它让用户能够从单张图像实现对光源的细粒度参数化控制, 可以改变可见光源的强度和颜色、环境光的强度,并且能够将虚拟光源插入场景中。 以电影为例, 好的电影中,光线能巧妙地塑造角色情绪、烘托故事氛围、引导观众目光,甚至能揭示人物的内心世界。 然而,无论是传统的摄影后期处理,还是数字渲染后的调整,精确控制光影方向、颜色和强度,始终是一项耗时耗力、且极依赖经验的挑战。 现有的光照编辑技术,要么需要很多照片才能工作(不适用于单张照片),要么虽然能编辑,但你不能精确地告诉它怎么变(比如具体亮多少、变成什么颜 色)。 Google 的研究团队通过在一个特殊构建的数据集上微调(fine-tune)扩散模型,使其学会如何精确地控制图像中的光照。 LightLab: Controlling Light Sources in Images with Diffusion Models 论文地址:https://arxiv.org/abs/2505.09608 项目主页:https://nadmag.github. ...