悟界・Emu3.5
Search documents
2026十大AI技术趋势报告
Sou Hu Cai Jing· 2026-01-12 08:10
Core Insights - The article discusses the evolution of artificial intelligence (AI) from a rapid initial phase to a more mature stage characterized by cognitive enhancement, collaborative clusters, and deep industry integration, outlining ten core trends that shape the new blueprint of the intelligent era [1]. Group 1: AI Model Evolution - The evolution of foundational models is described as machines approaching human cognitive limits, with the "pre-training + post-training" paradigm validated by the industry since late 2024 [1]. - Breakthroughs in the multimodal field hinge on the transition from "Next Token Prediction" to "Next-State Prediction (NSP)," enabling AI to learn physical dynamics, temporal continuity, and causal relationships like humans [1]. Group 2: Industry Trends and Developments - By 2025, the industry is expected to enter a "clearing" phase, with over 230 embodied intelligence companies in China, including more than 100 humanoid robot firms, facing significant technical challenges and funding requirements [2]. - The commercial focus has shifted from laboratory validation to mass production, with humanoid robot sales surpassing 10,000 units and large-scale orders becoming common [2]. Group 3: Multi-Agent Systems (MAS) - AI applications are evolving from single-agent systems (SAS) to multi-agent systems (MAS), with SAS applications currently accounting for 63% in areas like customer service and code generation [3]. - A report indicates that 57% of organizations have deployed agents to handle multi-stage workflows, with this figure projected to rise to 81% by 2026 [3]. Group 4: Communication Protocols and AI for Science - The core breakthrough in MAS is the unification of communication protocols, with MCP and A2A protocols being integrated into the Linux Foundation, supporting complex applications [4]. - AI for Science (AI4S) has evolved from a supportive tool to an AI Scientist capable of executing a complete research workflow, marking a significant shift in scientific research methodologies [4]. Group 5: Global Competition and Infrastructure - The international competition is intensifying, with the U.S. launching the "Genesis Project" in November 2025 to accelerate the large-scale implementation of AI4S [5]. - China exhibits strengths in application but lacks in foundational infrastructure such as computing power, data, and models, with the national data center holding 4.6PB of data as of 2025 [5]. Group 6: Consumer AI and Vertical Markets - Consumer AI competition is focusing on "Super Apps," which integrate various functionalities into a single platform, with apps like ChatGPT and Gemini achieving over 100 million daily active users [5]. - Vertical markets show significant potential, with multimodal models demonstrating high value despite low usage frequency, as seen in the success of health management apps like Ant Financial's Aifeng [6]. Group 7: Challenges and Future Outlook - Many ToB AI applications remain in the proof of concept (PoC) stage, with 95% of GenAI pilot projects failing to produce measurable impacts due to data quality and integration challenges [6]. - The second half of 2026 is anticipated to be a critical period for the MVP rollout of ToB applications, with a clear implementation path for data governance and API connections [7]. Group 8: Synthetic Data and Cost Reduction - Synthetic data is emerging as a crucial resource for the AI 2.0 era, addressing the shortage of real data, with companies like NVIDIA optimizing 3D detection using synthetic datasets [8]. - The cost of inference has significantly decreased, with the cost per million tokens dropping from $20 to $0.07 between November 2022 and October 2024, reflecting a 280-fold reduction in 18 months [8].
刚刚,智源悟界·Emu3.5登场,原生具备世界建模能力
机器之心· 2025-10-30 08:52
Core Insights - The article discusses the release of the latest multimodal model, Emu3.5, by the Beijing Academy of Artificial Intelligence (BAAI), highlighting its capabilities and innovations in the field of AI [3][4][6]. Model Overview - Emu3.5 is defined as a "Multimodal World Foundation Model," which distinguishes itself from other generative models through its inherent world modeling capabilities [4][5]. - The model has been trained on over 10 trillion multimodal tokens, primarily sourced from internet videos totaling approximately 790 years in duration, allowing it to internalize the dynamic laws of the physical world [5][16]. Technological Innovations - Emu3.5 introduces the "Discrete Diffusion Adaptation" (DiDA) technology, which enhances image inference speed by nearly 20 times with minimal performance loss, making it competitive with top closed-source diffusion models [6][24]. - The model's architecture is based on a 34 billion parameter dense transformer, focusing on "Next-State Prediction" to unify its objectives [11][17]. Performance and Capabilities - Emu3.5 demonstrates state-of-the-art performance in various tasks, including image editing and generation, visual narrative creation, and visual guidance, outperforming competitors like Google's Gemini-2.5-Flash-Image [28][35]. - The model can generate coherent visual narratives and step-by-step visual tutorials, marking a significant advancement from traditional multimodal models [13][14]. Training Process - The training process consists of four core stages: large-scale pre-training, fine-tuning on high-quality datasets, large-scale multimodal reinforcement learning, and efficient autoregressive inference acceleration [17][21][22][24]. - The model's training data includes a vast array of visual-language interleaved data, allowing it to learn about physical dynamics and causality [16][41]. Future Implications - Emu3.5 is positioned as a foundational model for future developments in embodied intelligence, capable of generating diverse virtual environments and task planning data [39][41]. - The open-sourcing of Emu3.5 is expected to provide a robust new foundation for the global AI research community [7][45].