VeOmni

Search documents
AI动态汇总:智元推出机器人世界模型平台genieenvesioner,智谱上线GLM-4.5a视觉推理模型
China Post Securities· 2025-08-25 11:47
- The Genie Envisioner platform introduces a video-centric world modeling paradigm, directly modeling robot-environment interactions in the visual space, which retains spatial structure and temporal evolution information. This approach enhances cross-domain generalization and long-sequence task execution capabilities, achieving a 76% success rate in long-step tasks like folding cardboard boxes, outperforming the π0 model's 48%[12][13][16] - The Genie Envisioner platform comprises three core components: GE-Base, a multi-view video world foundation model trained on 3000 hours of real robot data; GE-Act, a lightweight 160M parameter action decoder enabling real-time control; and GE-Sim, a hierarchical action-conditioned simulator for closed-loop strategy evaluation and large-scale data generation[16][17][19] - The GLM-4.5V visual reasoning model, with 106B total parameters and 120B activation parameters, achieves state-of-the-art (SOTA) performance across 41 multimodal benchmarks, including image, video, document understanding, and GUI agent tasks. It incorporates 3D-RoPE and bicubic interpolation mechanisms to enhance 3D spatial relationship perception and high-resolution adaptability[20][21][22] - GLM-4.5V employs a three-stage training strategy: pretraining on large-scale multimodal corpora, supervised fine-tuning with "chain of thought" samples, and reinforcement learning with RLVR and RLHF techniques. This layered training enables superior document processing capabilities and emergent abilities like generating structured HTML/CSS/JavaScript code from screenshots or videos[23][24][26] - VeOmni, a fully modular multimodal training framework, decouples model definition from distributed parallel logic, enabling flexible parallel strategies like FSDP, HSDP+SP, and EP. It achieves 43.98% MFU for 64K sequence training and supports up to 192K sequence lengths, reducing engineering complexity and improving efficiency by over 90%[27][28][31] - VeOmni introduces asynchronous sequence parallelism (Async-Ulysses) and COMET technology for MoE models, achieving linear scalability in training throughput for 30B parameter models under 160K sequence lengths. It also integrates dynamic batch processing and FlashAttention to minimize memory waste and optimize operator-level recomputation[31][32][34] - Skywork UniPic 2.0, a unified multimodal framework, integrates image understanding, text-to-image (T2I) generation, and image-to-image (I2I) editing within a single model. It employs a progressive dual-task reinforcement strategy (Flow-GRPO) to optimize image editing and T2I tasks sequentially, achieving superior performance in benchmarks like GenEval and GEdit-EN[35][38][39] - UniPic 2.0 leverages Skywork-EditReward, an image-editing-specific reward model, to provide pixel-level quality scores. This design enables precise recognition of image elements and generation of corresponding textual descriptions, achieving 83.5 points in MMBench, comparable to 19B parameter models[38][42][43] - FlowReasoner, a query-level meta-agent framework, dynamically generates personalized multi-agent systems for individual queries. It employs GRPO reinforcement learning with multi-objective reward mechanisms, achieving 92.15% accuracy on the MBPP dataset and outperforming baseline models like Aflow and LLM-Blender[63][64][68] - FlowReasoner utilizes a three-stage training process: supervised fine-tuning with synthetic data, SFT fine-tuning for workflow generation, and RL with external feedback for capability enhancement. It demonstrates robust generalization, maintaining high accuracy even when the base worker model is replaced[66][68][69]
字节突然开源Seed-OSS,512K上下文碾压主流4倍长度!推理能力刷新纪录
量子位· 2025-08-21 02:36
Core Viewpoint - ByteDance has launched an open-source large model named Seed-OSS-36B, featuring 360 billion parameters, which aims to compete with existing models like OpenAI's GPT-OSS series [1][3][4]. Model Features - Seed-OSS-36B boasts a native context window of 512K, significantly larger than the 128K offered by mainstream models like DeepSeek V3.1, allowing it to handle complex tasks such as legal document review and long report analysis [5][6][8]. - The model introduces a "Thinking Budget" mechanism, enabling users to set a token limit for the model's reasoning depth, which can be adjusted based on task complexity [9][10][12]. - The architecture includes 360 billion parameters, 64 layers, and utilizes RoPE position encoding, GQA attention mechanism, RMSNorm normalization, and SwiGLU activation function [13][14]. Performance Metrics - Seed-OSS-36B-Base achieved a score of 65.1 on the MMLU-Pro benchmark, outperforming Qwen2.5-32B-Base, which scored 58.5 [16]. - The model scored 87.7 on the BBH reasoning benchmark, setting a new record for open-source models, and demonstrated strong performance in math and coding tasks [17][18]. - The instruction-tuned version, Seed-OSS-36B-Instruct, scored 91.7 on the AIME24 math competition, ranking just below OpenAI's OSS-20B [20]. Development Background - The ByteDance Seed team, established in 2023, aims to create advanced AI foundational models and has released several impactful projects, including Seed-Coder and BAGEL, which address various AI tasks [21][22][23]. - The team has also developed VeOmni, a distributed training framework, and Seed LiveInterpret, an end-to-end simultaneous interpretation model [24][25]. Open Source Contribution - With the release of Seed-OSS, ByteDance adds a significant player to the domestic open-source base model landscape, promoting further advancements in AI technology [26].
【AI产业跟踪】百川开源医疗大模型Baichuan~M2
GUOTAI HAITONG SECURITIES· 2025-08-19 09:42
Report Summary 1. Report Industry Investment Rating There is no information about the industry investment rating in the provided content. 2. Core View of the Report The report provides a comprehensive overview of the latest trends in the AI industry, including company performance, application scenarios, model releases, and technological breakthroughs, demonstrating the rapid development and wide - ranging impact of AI technology [9][10][11]. 3. Summary by Directory AI Industry Dynamic - Tencent's Q2 2025 AI profitability emerged. Its Q2 revenue was 184.5 billion yuan, a 15% increase year - on - year, and net profit was 55.6 billion yuan, a 17% increase. AI has become an important driving force for its core business [9]. AI Application Information - The "National Railway Value Engineering Multi - modal Large Model Application Platform" jointly developed by SenseTime and the First Survey and Design Institute of China Railway was launched, offering five functions with over 90% accuracy in 28,000 test questions and covering 420GB of data [10]. AI Large Model Information - Shanghai Jiao Tong University released the native brain - inspired large model BriLLM, which can handle infinite context and has 100% full interpretability, with a nearly 90% reduction in parameters through sparse technology [11]. - The University of Hong Kong and other institutions open - sourced the OpenCUA framework, and its flagship model OpenCUA - 32B achieved a 34.8% success rate on the OSWorld - Verified benchmark [12]. - ByteDance open - sourced the full - modal training framework VeOmni, which can reduce engineering development time by over 90% and has high throughput [13][14]. - Zhipu released the GLM - 4.5V multi - modal model, achieving SOTA results in 41 out of 42 public visual multi - modal tasks [15]. - Alibaba's DAMO Academy open - sourced the Rynn series of models to solve the problems of data, model, and robot adaptation in embodied intelligence development [16]. - Baichuan released the open - source medical large model Baichuan - M2, which outperformed most models in the OpenAI HealthBench evaluation and supports single - card deployment on RTX4090 [17]. Technology Frontier - The first open - source reinforcement learning solution FlashRL supporting 8Bit Rollout was released, which can speed up training by 1.7 times without loss of performance [19]. - China developed the first hybrid pollination robot, which can reduce breeding costs, shorten the cycle, and improve efficiency [22].
TMT行业周报(8月第3周):国内晶圆厂代工厂2025Q2业绩超预期-20250818
Century Securities· 2025-08-18 01:29
Investment Rating - The report provides a positive outlook for the domestic wafer foundry industry, indicating an "Outperform" rating for the sector [2]. Core Insights - The performance of domestic wafer foundries, specifically SMIC and Hua Hong Semiconductor, exceeded expectations in Q2 2025, with SMIC reporting revenues of $2.209 billion, a slight decline of 1.7% quarter-on-quarter, but better than the company's guidance of a 4-6% decline. Hua Hong Semiconductor reported revenues of $566 million, a 4.6% increase quarter-on-quarter, aligning with its guidance [3][4]. - Both companies are expected to maintain high utilization rates, reflecting a gradual recovery in domestic semiconductor demand, driven primarily by analog chip orders. The report anticipates a resurgence in demand for power semiconductors in Q3 2025 [3][4]. - The report suggests that the high utilization rates and improved product mix are effectively offsetting increased depreciation costs, leading to a quarterly upward trend in overall gross margins [3][4]. Weekly Market Review - The TMT sector saw significant gains in the week of August 11-15, with telecommunications up 7.66%, electronics up 7.02%, and computers up 5.38%, outperforming the CSI 300 index, which rose by 2.37% [3][4]. - Notable sub-sectors included communication network equipment and devices, which rose by 12.40%, and passive components, which increased by 12.32% [3][4]. Industry News and Key Company Announcements - The report highlights several key events in the industry, including the upcoming Baidu Cloud Intelligence Conference and the launch of new hardware by Google [15][16]. - It also notes significant developments in AI and data infrastructure, with central enterprises increasing their investments in AI applications and data trading [18][19]. - The report mentions that the AI server market is experiencing substantial growth, with companies like Hon Hai reporting a 60% year-on-year increase in AI server revenue for Q2 2025 [21][22].
可灵 AI 技术部换将;宇树机器人“撞人逃逸”上热搜;邓紫棋自曝投资 AI 公司获 10 倍收益 | AI周报
AI前线· 2025-08-17 05:33
Group 1 - The first humanoid robot sports event took place on August 14, featuring 280 teams from 16 countries, showcasing the capabilities of humanoid robots in various competitions [3][4] - The UTree H1 robot won the 1500 meters race with a time of 6:34.40, marking the first gold medal in the event [3] - The TianGong robot team lost to UTree in both the 1500 meters and 400 meters races, with the CTO of TianGong expressing a desire to learn from UTree's performance [3][4] Group 2 - A corruption scandal involving DeepSeek's parent company has emerged, revealing that over 1.18 billion yuan was illicitly obtained through a kickback scheme over six years [8][9] - Reports indicate that DeepSeek's next-generation model, R2, will not be released in August as previously speculated, with the focus instead on iterative improvements to existing products [10] - The company has faced challenges due to supply chain issues related to AI chips, impacting its development timeline [10] Group 3 - Manus is facing potential forced withdrawal of a $75 million investment from Benchmark due to regulatory scrutiny over compliance with U.S. investment restrictions in Chinese AI firms [11] - The company has shifted its focus from domestic expansion to international markets, particularly Singapore, following the investment controversy [11][12] Group 4 - Kuaishou announced a leadership change in its AI division, with Gai Kun taking over the technical department, amid rumors of the departure of the previous head [12][13] - The CEO of Leifen publicly criticized a former employee over product performance comparisons, indicating internal conflicts and challenges in the company's public image [14] Group 5 - OpenAI employees are seeking to sell approximately $6 billion in stock at a valuation of $500 billion, indicating strong investor interest despite the company's current losses [15] - The company is also exploring advertising as a revenue stream while maintaining a focus on subscription growth [38] Group 6 - Alibaba's "扫地僧" Cai Jingxian, the first programmer for Taobao, has reportedly left the company, marking a significant personnel change [17][18] - G.E. has launched a new open-source platform for robotics, aiming to integrate various aspects of robot control and learning [36] Group 7 - The National Data Bureau reported a dramatic increase in daily token consumption in AI applications, reflecting rapid growth in the sector [30] - Alibaba's international platform has gained popularity with its AI agent, prompting plans for expansion to accommodate increased demand [31]