多模态生成模型

Search documents
全球超一半风投涌向AI!启明创投发布2025年AI十大展望
Zheng Quan Shi Bao Wang· 2025-07-28 07:38
Core Insights - AI startups attracted 53% of global venture capital funds in the first half of 2025, indicating a significant investment trend in the AI sector [1] - The emergence of general video models is expected within 12-24 months, which will revolutionize video content generation and interaction [1][4] - The AI BPO model is projected to achieve commercialization breakthroughs in the next 12-24 months, shifting from "delivery tools" to "delivery results" [6] Investment Trends - The rapid growth of token consumption by leading models in the US and China, with Google and Doubao experiencing increases of 48 times and 137 times respectively, highlights the dual drivers of model capability enhancement and new application emergence [4] - The AI investment landscape is evolving, with a focus on vertical applications where startups leverage industry knowledge to differentiate themselves from larger companies [5] Technological Advancements - The development of AI agents is anticipated to transition from "tool assistance" to "task undertaking," with the first true "AI employees" expected to participate in core business processes [4] - AI infrastructure is set to see advancements in GPU production and new AI cloud chips, which will enhance performance and reduce costs [6] Market Applications - AI applications are increasingly embedded in daily life, with healing and companionship becoming significant use cases by 2025 [5] - The shift in AI interaction paradigms is expected to accelerate, reducing reliance on traditional devices and promoting the rise of AI-native super applications [6]
训练数据爆减至1/1200!清华&生数发布国产视频具身基座模型,高效泛化复杂物理操作达SOTA水平
量子位· 2025-07-25 05:38
Core Viewpoint - The article discusses the breakthrough of the Vidar model developed by Tsinghua University and Shengshu Technology, which enables robots to learn physical operations through ordinary video, achieving a significant leap from virtual to real-world execution [3][27]. Group 1: Model Development and Capabilities - Vidar utilizes a base model called Vidu, which is pre-trained on internet-scale video data and further trained with millions of heterogeneous robot videos, allowing it to generalize quickly to new robot types with only 20 minutes of real robot data [4][10]. - The model addresses the challenges of data scarcity and the need for extensive multimodal data in current visual-language-action (VLA) models, significantly reducing the data requirements for large-scale generalization [5][6]. - Vidar's architecture includes a video diffusion model that predicts task-specific videos, which are then decoded into robotic arm actions using an inverse dynamics model [7][11]. Group 2: Training Methodology - The embodied pre-training method proposed by the research team integrates a unified observation space, extensive embodied data pre-training, and minimal target robot fine-tuning to achieve precise control in video tasks [10]. - The model's performance was validated through tests on the VBench video generation benchmark, showing significant improvements in subject consistency, background consistency, and imaging quality after embodied data pre-training [11][12]. Group 3: Action Execution and Generalization - The introduction of task-agnostic actions allows for easier data collection and generalization across tasks, eliminating the need for human supervision and annotation [13][15]. - The automated task-agnostic random actions (ATARA) method enables the collection of training data for previously unseen robots in just 10 hours, facilitating full action space generalization [15][18]. - Vidar demonstrated superior success rates in executing 16 common robotic tasks, particularly excelling in generalization to unseen tasks and backgrounds [25][27]. Group 4: Future Implications - The advancements made by Vidar lay a solid technical foundation for future service robots to operate effectively in complex real-world environments such as homes, hospitals, and factories [27]. - The model represents a critical bridge between virtual algorithm training and real-world autonomous actions, enhancing the integration of AI into physical tasks [27][28].
智谱与生数科技达成战略合作
news flash· 2025-04-27 06:10
Core Insights - The strategic partnership between Zhipu and Shenshu Technology focuses on leveraging their respective strengths in large language models and multimodal generation models for collaborative development and integration of products and solutions [1] Group 1: Strategic Collaboration - Zhipu and Shenshu Technology will collaborate on joint research and development, product linkage, solution integration, and industry synergy [1] - The strategic agreement includes the integration of Zhipu's MaaS platform with Shenshu Technology's Vidu API [1]