Skywork UniPic 2.0

Search documents
一周六连发!昆仑万维将多模态AI卷到了新高度
量子位· 2025-08-17 09:00
一水 发自 凹非寺 量子位 | 公众号 QbitAI 鲨疯了! 一周连发六款模型 。 火力全开的昆仑万维,正在把多模态AI卷到新高度。 8月11日~15日,这家公司天天都有新模型掉落,覆盖的还都是视频生成、世界模型、统一多模态、智能体以及AI音乐创作这些大热门,几乎 每一个都是 多模态AI应用的核心场景 。 用表格总结一下be like: | E | 产品/模型名称 | 描述 | | --- | --- | --- | | * Day1 | SkyReels-A3 | 一款音频驱动的人像视频生成模型 | | * Day2 | Matrix-Game 2.0 | 国产开源Genie 3, 交互世界模型升级版 | | | Matrix-3D | 3D场景生成大模型 | | * Day3 | Skywork UniPic 2.0 | 统一多模态理解、生成与编辑一体化 | | * Day4 | Skywork Deep Research Agent v2 | 多模态深度调研与浏览器智能体双升级 | | * Day5 | Mureka V7.5 | 中文音乐生成与描述性TTS新突破 | 一上来,昆仑万维就甩出了核心瞄准 ...
腾讯研究院AI每周关键词Top50
腾讯研究院· 2025-08-16 02:33
Group 1: Chip Industry - Export licensing fees are impacting Nvidia and AMD [3] - The U.S. is embedding trackers in chip exports [3] Group 2: Computing Power - Tesla's Dojo team has been disbanded [3] - Inspur is launching super-node AI servers [3] Group 3: AI Models - OpenAI's GPT-4o is making a comeback [3] - GPT-5 Pro is being developed by OpenAI [3] - Zhiyuan's GLM-4.5 has been released [3] - Kunlun Wanwei's SkyReels-A3 is now available [3] - Zhiyuan has open-sourced GLM-4.5V [3] - Tencent has introduced Large-Vision model [3] - Anthropic is working on a million-context model [3] - Kunlun Wanwei's Skywork UniPic 2.0 has been launched [3] Group 4: AI Applications - xAI has made Grok 4 available for free [3] - Tencent's CubeMe is integrating with mixed yuan [3] - Alibaba is developing embodied intelligence components [3] - Baichuan Intelligence has released Baichuan-M2 [3] - OpenAI's IOI Gold Medal has been awarded [3] - Kunlun Wanwei's Matrix-3D is now available [3] - SenseTime has introduced AI tools for film production [4] - Apple's new Siri is being developed [4] - Pika is working on audio-driven performances [4] - Claude Code has launched Opus planning mode [4] - Kunlun Wanwei's Deep Research Agent v2 is now available [4] - Tencent's Hunyuan-GameCraft is being developed [4] - Microsoft has outlined five modes for AI agents [4] - The OpenCUA framework is being developed by HKU and others [4] Group 5: Technology Developments - Over 100 robots were showcased at the World Robot Conference [4] - Agile intelligent robots are being developed by Lingqiao Intelligent [4] - Figure is working on robots that can fold clothes [4] - Apple's AI suite is being expanded [4] - Zhiyuan Robotics has launched an open-source world model platform [4] Group 6: Industry Insights - Wang Xingxing discusses the development of embodied intelligence [4] - Product Hunt highlights AI product releases [4] - Nvidia and others are exploring physical AI [4] - Scaling Law is being analyzed by Bi Shuchao [4] - The application of large models is discussed by Artificial Analysis [4] - Programming ability assessments are being conducted by foreign developers [4] - DeepMind emphasizes the importance of Genie 3 [4] - Notion is working on AI product standards [4] - Greg Brockman addresses algorithm bottlenecks [4] - Wang Xiaochuan discusses medical large models [4] Group 7: Capital Movements - Meta has acquired WaveForms [4] - Periodic Labs is securing funding for AI materials [4] - OpenAI is investing in brain-machine interfaces [4] - Perplexity has acquired Chrome [4] Group 8: Events - OpenAI is involved in AI chess events [4] - GitHub has merged with CoreAI [4]
财信证券晨会纪要-20250814
Caixin Securities· 2025-08-13 23:30
Market Strategy - The market has seen a renewed surge, with the Shanghai Composite Index breaking through the previous high from October 8, 2024 [4][6] - The overall A-share market, represented by the Wind All A Index, rose by 1.02% to close at 5801.59 points, while the Shanghai Composite Index increased by 0.48% to 3683.46 points [6][7] - The small-cap stocks outperformed larger stocks, with the CSI 1000 Index rising by 1.45% [7] Industry Dynamics - Kunlun Wanwei (300418.SZ) has officially open-sourced the "Skywork UniPic 2.0" model, which includes three core modules for image editing and generation [24][25] - Apple has introduced a new technology in its smart glasses that utilizes adjustable lenses to correct nearsightedness and enhance visual comfort [27][28] Company Tracking - Guizhou Moutai (600519.SH) reported a stable growth in revenue and net profit for H1 2025, with revenues of 893.89 billion and a net profit of 454.03 billion, reflecting year-on-year increases of 9.10% and 8.89% respectively [30][31] - Huajin Co., Ltd. (000059.SZ) experienced a decline in performance due to weak terminal demand, with a total revenue of 201.04 billion, down 5.01% year-on-year [33][34] - Rongchang Bio (688331.SH) announced that its drug, Tai Ta Xi Pu, for treating primary Sjögren's syndrome has met its primary endpoint in Phase III clinical trials [35] - Zhongtian Technology (600522.SH) plans to invest 80 million USD to establish a wholly-owned subsidiary in Saudi Arabia to enhance its competitiveness in the local market [36][37] - Zhuzhou Smelter Group (600961.SH) achieved a net profit of 5.85 billion in H1 2025, marking a year-on-year increase of 57.83% [38][39]
腾讯研究院AI速递 20250814
腾讯研究院· 2025-08-13 16:01
Group 1 - OpenAI and co-founder Sam Altman are backing a new brain-computer interface company, Merge Labs, which is expected to be valued at $850 million, directly competing with Elon Musk's Neuralink [1] - Altman will co-found Merge Labs but will not be involved in daily management, aligning with his vision of human-machine integration from his 2017 blog post [1] - Unlike Neuralink, which has conducted human clinical trials, Merge Labs is in its early stages but aims to develop simpler and more practical brain-computer interfaces leveraging advancements in AI [1] Group 2 - Anthropic announced that Claude Sonnet 4 now supports a context window of up to 1 million tokens, five times its previous capacity, allowing it to handle over 75,000 lines of code or multiple research papers in a single request [2] - Pricing adjustments have been made for the extended context, with costs set at $3 per million tokens for inputs under 200K and $6 for inputs exceeding that, while outputs are priced at $15 and $22.5 respectively [2] - This feature is currently in public beta on Amazon Bedrock and will soon be available on Google Cloud's Vertex AI platform, with early partners indicating it enables true "production-grade AI engineering" capabilities [2] Group 3 - Kunlun Wanwei has open-sourced the Skywork UniPic 2.0 model, creating a unified multimodal framework for understanding, generating, and editing images, achieving "efficient, high-quality, and unified" results [3] - The model consists of three core modules: an image editing module based on SD3.5-Medium, a connector for pre-trained multimodal capabilities, and a Flow-GRPO progressive dual-task reinforcement strategy [3] - The UniPic2-SD3.5M-Kontext-2B model surpasses the image generation metrics of the 12B parameter Flux.dev and outperforms the editing capabilities of the same parameter Flux-Kontakt [3] Group 4 - AI startup Perplexity has made a formal offer to acquire Google's Chrome browser business for $34.5 billion in cash, which is double its own valuation of $18 billion [4] - The timing of the acquisition proposal coincides with Google's ongoing antitrust litigation with the U.S. Department of Justice [4] - Perplexity has committed to maintaining the Chromium open-source project and investing over $3 billion within two years post-acquisition, although Google has expressed no intention to sell Chrome, leading to low market expectations for the deal's success [4] Group 5 - Pika has launched an "audio-driven performance model" that combines static images with audio to generate highly synchronized videos, achieving precise lip-syncing and natural expression changes [5] - This technology can perfectly match the image subject to the audio content, producing 720p HD videos in an average of just 6 seconds, with no length limitations [5] Group 6 - Figure has demonstrated a humanoid robot capable of folding clothes, showcasing that the original logistics sorting capabilities can be enhanced simply by adding data [6] - The robot exhibited human-like behaviors such as eye contact, nodding, and gestures, controlled by an end-to-end visual-language-action model [6] - Folding clothes is a challenging dexterous task for robots due to the deformable and diverse shapes of clothing, but Figure successfully achieved this using the Helix architecture without changing the underlying structure [6] Group 7 - DeepMind's founder Demis Hassabis revealed that Genie 3 not only generates virtual worlds but also allows these worlds to operate in reality, supporting agent training [7] - The team has begun testing the Sima agent within the worlds generated by Genie 3, marking a breakthrough in "AI running in another AI's brain" [7] - Hassabis believes that model evaluation will be crucial for future AI development, with Game Arena serving as an important benchmark due to its features of "immediate feedback" and "adaptive difficulty" [7] Group 8 - Notion's founder Ivan Zhao stated that successful AI products should aim for a score of 7.5, emphasizing the need to create an "AI workspace" that shifts AI from merely providing tools to delivering "the work itself" [8] - He compared AI product development to "brewing beer" rather than "building bridges," indicating that it often only achieves 70-80% of the desired functionality and requires extensive experimentation [8] - Zhao highlighted the importance of balancing craftsmanship and practicality in AI products, noting that excessive pursuit of perfection can detract from commercial value, particularly stressing the significance of context integration in AI applications [8] Group 9 - OpenAI co-founder Greg Brockman noted that AI development is currently experiencing a "return to foundational research" phase, where algorithms are once again the critical bottleneck rather than mere scale expansion [9] - He described the future AI infrastructure as needing to balance "long-duration heavy computation" with "real-time responsiveness," suggesting that homogeneous accelerators are a good starting point [9] - Brockman predicts that the AI ecosystem will exhibit a "blooming" pattern rather than a singular model, and achieving a tenfold economic growth in AI will require deep consideration of application methods by experts across various fields [9]
昆仑万维开源“Skywork UniPic 2.0”模型
Zheng Quan Ri Bao Wang· 2025-08-13 06:16
Group 1 - Kunlun Wanwei Technology Co., Ltd. has launched the SkyWork AI technology release week from August 11 to August 15, during which it will unveil a new model each day, focusing on cutting-edge models for multi-modal AI core scenarios [1] - As of now, Kunlun Wanwei has released the SkyReels-A3, Matrix-Game2.0, and Matrix-3D models [1] - On August 13, Kunlun Wanwei officially open-sourced the "Skywork UniPic 2.0" model, which aims to provide an efficient training and inference framework for unified multi-modal modeling [1] Group 2 - The "Skywork UniPic 2.0" model consists of three core modules: image generation and editing, unified model capabilities, and post-training for image generation and editing [1] - The image generation and editing module has been improved to accept both text and image inputs, expanding its capabilities through high-quality image generation and editing data training [2] - The unified model capability is achieved by freezing the image generation and editing module and utilizing a multi-modal model (Qwen2.5-VL-7B) along with a pre-train connector to build integrated understanding, generation, and editing capabilities [2] - To enhance overall performance, a progressive dual-task reinforcement strategy based on Flow-GRPO has been designed for post-training, allowing for collaborative optimization of generation and editing tasks without interference [2]