Workflow
Skywork UniPic 2.0
icon
Search documents
AI动态汇总:智元推出机器人世界模型平台genieenvesioner,智谱上线GLM-4.5a视觉推理模型
China Post Securities· 2025-08-25 11:47
- The Genie Envisioner platform introduces a video-centric world modeling paradigm, directly modeling robot-environment interactions in the visual space, which retains spatial structure and temporal evolution information. This approach enhances cross-domain generalization and long-sequence task execution capabilities, achieving a 76% success rate in long-step tasks like folding cardboard boxes, outperforming the π0 model's 48%[12][13][16] - The Genie Envisioner platform comprises three core components: GE-Base, a multi-view video world foundation model trained on 3000 hours of real robot data; GE-Act, a lightweight 160M parameter action decoder enabling real-time control; and GE-Sim, a hierarchical action-conditioned simulator for closed-loop strategy evaluation and large-scale data generation[16][17][19] - The GLM-4.5V visual reasoning model, with 106B total parameters and 120B activation parameters, achieves state-of-the-art (SOTA) performance across 41 multimodal benchmarks, including image, video, document understanding, and GUI agent tasks. It incorporates 3D-RoPE and bicubic interpolation mechanisms to enhance 3D spatial relationship perception and high-resolution adaptability[20][21][22] - GLM-4.5V employs a three-stage training strategy: pretraining on large-scale multimodal corpora, supervised fine-tuning with "chain of thought" samples, and reinforcement learning with RLVR and RLHF techniques. This layered training enables superior document processing capabilities and emergent abilities like generating structured HTML/CSS/JavaScript code from screenshots or videos[23][24][26] - VeOmni, a fully modular multimodal training framework, decouples model definition from distributed parallel logic, enabling flexible parallel strategies like FSDP, HSDP+SP, and EP. It achieves 43.98% MFU for 64K sequence training and supports up to 192K sequence lengths, reducing engineering complexity and improving efficiency by over 90%[27][28][31] - VeOmni introduces asynchronous sequence parallelism (Async-Ulysses) and COMET technology for MoE models, achieving linear scalability in training throughput for 30B parameter models under 160K sequence lengths. It also integrates dynamic batch processing and FlashAttention to minimize memory waste and optimize operator-level recomputation[31][32][34] - Skywork UniPic 2.0, a unified multimodal framework, integrates image understanding, text-to-image (T2I) generation, and image-to-image (I2I) editing within a single model. It employs a progressive dual-task reinforcement strategy (Flow-GRPO) to optimize image editing and T2I tasks sequentially, achieving superior performance in benchmarks like GenEval and GEdit-EN[35][38][39] - UniPic 2.0 leverages Skywork-EditReward, an image-editing-specific reward model, to provide pixel-level quality scores. This design enables precise recognition of image elements and generation of corresponding textual descriptions, achieving 83.5 points in MMBench, comparable to 19B parameter models[38][42][43] - FlowReasoner, a query-level meta-agent framework, dynamically generates personalized multi-agent systems for individual queries. It employs GRPO reinforcement learning with multi-objective reward mechanisms, achieving 92.15% accuracy on the MBPP dataset and outperforming baseline models like Aflow and LLM-Blender[63][64][68] - FlowReasoner utilizes a three-stage training process: supervised fine-tuning with synthetic data, SFT fine-tuning for workflow generation, and RL with external feedback for capability enhancement. It demonstrates robust generalization, maintaining high accuracy even when the base worker model is replaced[66][68][69]
昆仑万维: 2025年半年度报告摘要
Zheng Quan Zhi Xing· 2025-08-22 16:36
Core Viewpoint - Kunlun Wanwei Technology Co., Ltd. reported a significant increase in revenue but also a substantial net loss for the first half of 2025, indicating challenges in profitability despite growth in AI business and overseas revenue [1][3]. Financial Performance - Total revenue for the reporting period reached 3.733 billion yuan, a year-on-year increase of 49.23% [1][3]. - Net profit attributable to shareholders was -855.55 million yuan, a decline of 119.86% compared to the previous year [1][3]. - The net cash flow from operating activities was -560.46 million yuan, a decrease of 396.40% year-on-year [1][3]. - Basic and diluted earnings per share were both -0.69 yuan, representing a 115.63% decline [1][3]. - Total assets at the end of the reporting period were 20.33 billion yuan, down 1.68% from the previous year [1][3]. Business Development - The company is focusing on AI business development, particularly in large models and multimodal technology, with significant investments in AGI and AIGC [3][4]. - Overseas revenue reached 3.441 billion yuan, growing by 56.02% and accounting for 92.17% of total revenue, an increase of 4 percentage points year-on-year [3][4]. - The company launched several industry-leading models in various fields, including reinforcement learning and multimodal reasoning, enhancing its competitive edge [4][5][6]. AI Product Innovations - The company released the Skywork-Reward-V2 series of reward models, setting industry benchmarks in reinforcement learning [4]. - The Skywork-R1V multimodal visual reasoning model was open-sourced, achieving high scores in visual reasoning benchmarks [5]. - The Matrix-Zero world model was introduced, enabling real-time interactive 3D scene generation, enhancing user experience in digital content production [7]. Future Outlook - The company plans to deepen its AI capabilities, focusing on technology productization and industrialization, while expanding commercial partnerships [3][8]. - The new AI Office product, TianGong Super Agent, aims to revolutionize office and content creation paradigms, supporting cross-platform collaboration [9][10]. - The company is committed to continuous investment in large model development and AI technology innovation to drive future growth [8][9].
一周六连发!昆仑万维将多模态AI卷到了新高度
量子位· 2025-08-17 09:00
Core Viewpoint - Kunlun Wanwei has launched six new models in one week, showcasing its advancements in multimodal AI applications, including video generation, world models, and AI music creation, indicating a strategic push in the AI sector [2][5][63]. Group 1: Model Launches - The company released the SkyReels-A3 model, designed for digital human live-streaming, which can generate realistic videos driven by audio input, enhancing the e-commerce landscape [9][10][16]. - Matrix-Game 2.0, an upgraded interactive world model, was introduced, boasting real-time generation and long-sequence capabilities, positioning it as a competitor to Google's Genie 3 [19][20][22]. - The Matrix-3D model was launched, integrating panoramic video generation and 3D reconstruction, breaking barriers between content generation and interaction [25][27]. - Skywork UniPic 2.0 was unveiled as a unified multimodal model capable of image understanding, generation, and editing, demonstrating a new training paradigm that reduces hardware requirements [29][31][33]. - The Skywork Deep Research Agent v2 was released, enhancing multimodal capabilities for deep research and content generation [37][38]. - Mureka V7.5, a music generation model, was launched, focusing on Chinese music, showcasing significant improvements in emotional expression and musicality [53][54][56]. Group 2: Strategic Insights - Kunlun Wanwei's strategy emphasizes vertical integration in AI, focusing on high-frequency application scenarios rather than general-purpose agents, which is seen as a more viable approach for future development [70][72][76]. - The company has committed substantial resources to R&D, with a projected R&D expenditure of 1.54 billion yuan in 2024, reflecting a 59.5% year-on-year increase, and a workforce of 1,554 dedicated to AI research [73][74]. - The open-source approach adopted by Kunlun Wanwei has positioned it as a leader in the AI ecosystem, contributing to its recognition as one of the "Top 16 AI Open Source Companies in China" [5][78].
腾讯研究院AI每周关键词Top50
腾讯研究院· 2025-08-16 02:33
Group 1: Chip Industry - Export licensing fees are impacting Nvidia and AMD [3] - The U.S. is embedding trackers in chip exports [3] Group 2: Computing Power - Tesla's Dojo team has been disbanded [3] - Inspur is launching super-node AI servers [3] Group 3: AI Models - OpenAI's GPT-4o is making a comeback [3] - GPT-5 Pro is being developed by OpenAI [3] - Zhiyuan's GLM-4.5 has been released [3] - Kunlun Wanwei's SkyReels-A3 is now available [3] - Zhiyuan has open-sourced GLM-4.5V [3] - Tencent has introduced Large-Vision model [3] - Anthropic is working on a million-context model [3] - Kunlun Wanwei's Skywork UniPic 2.0 has been launched [3] Group 4: AI Applications - xAI has made Grok 4 available for free [3] - Tencent's CubeMe is integrating with mixed yuan [3] - Alibaba is developing embodied intelligence components [3] - Baichuan Intelligence has released Baichuan-M2 [3] - OpenAI's IOI Gold Medal has been awarded [3] - Kunlun Wanwei's Matrix-3D is now available [3] - SenseTime has introduced AI tools for film production [4] - Apple's new Siri is being developed [4] - Pika is working on audio-driven performances [4] - Claude Code has launched Opus planning mode [4] - Kunlun Wanwei's Deep Research Agent v2 is now available [4] - Tencent's Hunyuan-GameCraft is being developed [4] - Microsoft has outlined five modes for AI agents [4] - The OpenCUA framework is being developed by HKU and others [4] Group 5: Technology Developments - Over 100 robots were showcased at the World Robot Conference [4] - Agile intelligent robots are being developed by Lingqiao Intelligent [4] - Figure is working on robots that can fold clothes [4] - Apple's AI suite is being expanded [4] - Zhiyuan Robotics has launched an open-source world model platform [4] Group 6: Industry Insights - Wang Xingxing discusses the development of embodied intelligence [4] - Product Hunt highlights AI product releases [4] - Nvidia and others are exploring physical AI [4] - Scaling Law is being analyzed by Bi Shuchao [4] - The application of large models is discussed by Artificial Analysis [4] - Programming ability assessments are being conducted by foreign developers [4] - DeepMind emphasizes the importance of Genie 3 [4] - Notion is working on AI product standards [4] - Greg Brockman addresses algorithm bottlenecks [4] - Wang Xiaochuan discusses medical large models [4] Group 7: Capital Movements - Meta has acquired WaveForms [4] - Periodic Labs is securing funding for AI materials [4] - OpenAI is investing in brain-machine interfaces [4] - Perplexity has acquired Chrome [4] Group 8: Events - OpenAI is involved in AI chess events [4] - GitHub has merged with CoreAI [4]
昆仑万维SkyWork AI技术发布周正式启动
Zhong Zheng Wang· 2025-08-14 12:13
Core Insights - Kunlun Wanwei has launched the SkyWork AI technology release week, introducing new models daily from August 11 to August 15, covering cutting-edge multi-modal AI core scenarios [1] - The Skywork Deep Research Agent v2, released on August 14, serves as the core engine for the Skywork Super Agents, significantly enhancing the role of large models in the AI Office domain [1][3] Technology Breakthroughs - The Skywork team has achieved breakthroughs in four key areas: multi-modal crawling technology (MM-Crawler), long-distance multi-modal information collection, asynchronous parallel multi-agent understanding architecture, and multi-modal result presentation capabilities [2] - The new version of Skywork Deep Research Agent v2 effectively integrates text and image reading, providing users with comprehensive, smooth, and visually friendly deep reports [2] Performance and Capabilities - The Skywork Browser Agent simulates human browsing and interaction, revolutionizing traditional data collection and analysis methods, and effectively addresses multiple pain points of conventional browser agents [3] - The Skywork Deep Research Agent v2 incorporates various enhancement mechanisms, including high-quality data synthesis and training, end-to-end reinforcement learning, efficient parallel reasoning, and a multi-agent self-learning evolution system, achieving state-of-the-art performance in multiple agent task evaluations [3] Evaluation and Results - In the authoritative search evaluation list BrowseComp, Skywork Deep Research has outperformed most similar products, achieving an accuracy rate of 27.8% in standard mode [4] - When utilizing the proprietary "Parallel Thinking" mode, the accuracy rate increases to 38.7%, setting a new industry SOTA record, with performance improving as thinking time increases [4]
财信证券晨会纪要-20250814
Caixin Securities· 2025-08-13 23:30
Market Strategy - The market has seen a renewed surge, with the Shanghai Composite Index breaking through the previous high from October 8, 2024 [4][6] - The overall A-share market, represented by the Wind All A Index, rose by 1.02% to close at 5801.59 points, while the Shanghai Composite Index increased by 0.48% to 3683.46 points [6][7] - The small-cap stocks outperformed larger stocks, with the CSI 1000 Index rising by 1.45% [7] Industry Dynamics - Kunlun Wanwei (300418.SZ) has officially open-sourced the "Skywork UniPic 2.0" model, which includes three core modules for image editing and generation [24][25] - Apple has introduced a new technology in its smart glasses that utilizes adjustable lenses to correct nearsightedness and enhance visual comfort [27][28] Company Tracking - Guizhou Moutai (600519.SH) reported a stable growth in revenue and net profit for H1 2025, with revenues of 893.89 billion and a net profit of 454.03 billion, reflecting year-on-year increases of 9.10% and 8.89% respectively [30][31] - Huajin Co., Ltd. (000059.SZ) experienced a decline in performance due to weak terminal demand, with a total revenue of 201.04 billion, down 5.01% year-on-year [33][34] - Rongchang Bio (688331.SH) announced that its drug, Tai Ta Xi Pu, for treating primary Sjögren's syndrome has met its primary endpoint in Phase III clinical trials [35] - Zhongtian Technology (600522.SH) plans to invest 80 million USD to establish a wholly-owned subsidiary in Saudi Arabia to enhance its competitiveness in the local market [36][37] - Zhuzhou Smelter Group (600961.SH) achieved a net profit of 5.85 billion in H1 2025, marking a year-on-year increase of 57.83% [38][39]
腾讯研究院AI速递 20250814
腾讯研究院· 2025-08-13 16:01
Group 1 - OpenAI and co-founder Sam Altman are backing a new brain-computer interface company, Merge Labs, which is expected to be valued at $850 million, directly competing with Elon Musk's Neuralink [1] - Altman will co-found Merge Labs but will not be involved in daily management, aligning with his vision of human-machine integration from his 2017 blog post [1] - Unlike Neuralink, which has conducted human clinical trials, Merge Labs is in its early stages but aims to develop simpler and more practical brain-computer interfaces leveraging advancements in AI [1] Group 2 - Anthropic announced that Claude Sonnet 4 now supports a context window of up to 1 million tokens, five times its previous capacity, allowing it to handle over 75,000 lines of code or multiple research papers in a single request [2] - Pricing adjustments have been made for the extended context, with costs set at $3 per million tokens for inputs under 200K and $6 for inputs exceeding that, while outputs are priced at $15 and $22.5 respectively [2] - This feature is currently in public beta on Amazon Bedrock and will soon be available on Google Cloud's Vertex AI platform, with early partners indicating it enables true "production-grade AI engineering" capabilities [2] Group 3 - Kunlun Wanwei has open-sourced the Skywork UniPic 2.0 model, creating a unified multimodal framework for understanding, generating, and editing images, achieving "efficient, high-quality, and unified" results [3] - The model consists of three core modules: an image editing module based on SD3.5-Medium, a connector for pre-trained multimodal capabilities, and a Flow-GRPO progressive dual-task reinforcement strategy [3] - The UniPic2-SD3.5M-Kontext-2B model surpasses the image generation metrics of the 12B parameter Flux.dev and outperforms the editing capabilities of the same parameter Flux-Kontakt [3] Group 4 - AI startup Perplexity has made a formal offer to acquire Google's Chrome browser business for $34.5 billion in cash, which is double its own valuation of $18 billion [4] - The timing of the acquisition proposal coincides with Google's ongoing antitrust litigation with the U.S. Department of Justice [4] - Perplexity has committed to maintaining the Chromium open-source project and investing over $3 billion within two years post-acquisition, although Google has expressed no intention to sell Chrome, leading to low market expectations for the deal's success [4] Group 5 - Pika has launched an "audio-driven performance model" that combines static images with audio to generate highly synchronized videos, achieving precise lip-syncing and natural expression changes [5] - This technology can perfectly match the image subject to the audio content, producing 720p HD videos in an average of just 6 seconds, with no length limitations [5] Group 6 - Figure has demonstrated a humanoid robot capable of folding clothes, showcasing that the original logistics sorting capabilities can be enhanced simply by adding data [6] - The robot exhibited human-like behaviors such as eye contact, nodding, and gestures, controlled by an end-to-end visual-language-action model [6] - Folding clothes is a challenging dexterous task for robots due to the deformable and diverse shapes of clothing, but Figure successfully achieved this using the Helix architecture without changing the underlying structure [6] Group 7 - DeepMind's founder Demis Hassabis revealed that Genie 3 not only generates virtual worlds but also allows these worlds to operate in reality, supporting agent training [7] - The team has begun testing the Sima agent within the worlds generated by Genie 3, marking a breakthrough in "AI running in another AI's brain" [7] - Hassabis believes that model evaluation will be crucial for future AI development, with Game Arena serving as an important benchmark due to its features of "immediate feedback" and "adaptive difficulty" [7] Group 8 - Notion's founder Ivan Zhao stated that successful AI products should aim for a score of 7.5, emphasizing the need to create an "AI workspace" that shifts AI from merely providing tools to delivering "the work itself" [8] - He compared AI product development to "brewing beer" rather than "building bridges," indicating that it often only achieves 70-80% of the desired functionality and requires extensive experimentation [8] - Zhao highlighted the importance of balancing craftsmanship and practicality in AI products, noting that excessive pursuit of perfection can detract from commercial value, particularly stressing the significance of context integration in AI applications [8] Group 9 - OpenAI co-founder Greg Brockman noted that AI development is currently experiencing a "return to foundational research" phase, where algorithms are once again the critical bottleneck rather than mere scale expansion [9] - He described the future AI infrastructure as needing to balance "long-duration heavy computation" with "real-time responsiveness," suggesting that homogeneous accelerators are a good starting point [9] - Brockman predicts that the AI ecosystem will exhibit a "blooming" pattern rather than a singular model, and achieving a tenfold economic growth in AI will require deep consideration of application methods by experts across various fields [9]
昆仑万维开源“Skywork UniPic 2.0”模型
Zheng Quan Ri Bao Wang· 2025-08-13 06:16
Group 1 - Kunlun Wanwei Technology Co., Ltd. has launched the SkyWork AI technology release week from August 11 to August 15, during which it will unveil a new model each day, focusing on cutting-edge models for multi-modal AI core scenarios [1] - As of now, Kunlun Wanwei has released the SkyReels-A3, Matrix-Game2.0, and Matrix-3D models [1] - On August 13, Kunlun Wanwei officially open-sourced the "Skywork UniPic 2.0" model, which aims to provide an efficient training and inference framework for unified multi-modal modeling [1] Group 2 - The "Skywork UniPic 2.0" model consists of three core modules: image generation and editing, unified model capabilities, and post-training for image generation and editing [1] - The image generation and editing module has been improved to accept both text and image inputs, expanding its capabilities through high-quality image generation and editing data training [2] - The unified model capability is achieved by freezing the image generation and editing module and utilizing a multi-modal model (Qwen2.5-VL-7B) along with a pre-train connector to build integrated understanding, generation, and editing capabilities [2] - To enhance overall performance, a progressive dual-task reinforcement strategy based on Flow-GRPO has been designed for post-training, allowing for collaborative optimization of generation and editing tasks without interference [2]
昆仑万维:正式开源「Skywork UniPic 2.0」模型
Core Viewpoint - Kunlun Wanwei officially open-sourced the "Skywork UniPic 2.0" model, aimed at efficient training and inference for unified multimodal modeling, focusing on lightweight generation and editing modules, and joint training of multimodal understanding models [1] Group 1: Product Development - The "Skywork UniPic 2.0" model integrates DiT and autoregressive paradigms, enhancing the capabilities of multimodal applications [1] - The company has previously open-sourced several state-of-the-art (SOTA) large models, including the first AI short drama creation video generation model SkyReels-V1, the world's first infinite-length movie generation model SkyReels-V2 using diffusion forcing framework, and the audio-driven portrait video generation model SkyReels-A3 [1] Group 2: Industry Impact - The open-sourcing of "Skywork UniPic 2.0" aims to empower developers and researchers to quickly engage with and build multimodal applications, promoting high efficiency, quality, and unity in multimodal generation models [1]