Workflow
Genie3
icon
Search documents
世界太小,不够世界模型们用了
3 6 Ke· 2025-12-04 09:29
世界模型,已经像这个世界一样混乱了。 OpenAI指着Sora生成的视频说,这就是"世界模拟器";杨立昆(Yann LeCun)指着Sora,说它是像素幻 觉,真正的世界模型应该是"预测未来的抽象大脑";谷歌DeepMind称,Genie3就是一个"可交互的通用世 界模型";而李飞飞说,"空间智能"才是正解。 现实世界是唯一的、客观的,但AI圈里似乎人人都在制造属于自己的"世界模型"。 尽管定义南辕北辙,但这群吵得不可开交的大佬们,在一个基本判断上达成了共识:大语言模型早晚到 头,世界模型才是通往AGI的必经之路。 大语言模型在GPT-3.5之后经历了参数的膨胀,而世界模型在技术路线收敛之前,就先经历了概念的通货 膨胀。 世界模型是个筐,啥都往里装 "世界模型"的混乱,根源在于它是一种目的,指的是让AI具备理解外部世界规律,预测世界变化的能力, 而非具体的技术路径。 最先混乱的就是概念。 关于世界模型的思想,最早可追溯至1943年认知科学家Kenneth Craik提出的"心智模型(Mental Model)",即大脑通过构建外部世界的微缩模型来进行预测,换句话说,我们脑中有一个心智模型,不仅 能处理当前看到 ...
站在内容创作者与机器人的交界处:聊聊3D数字人的进化
3 6 Ke· 2025-10-29 11:24
Core Insights - The rise of 3D digital humans is transforming content creation and interaction, moving from rigid, scripted avatars to dynamic, responsive entities capable of real-time expression and movement [1][2] - The technology behind 3D digital humans is evolving, with significant advancements in AI and rendering techniques that reduce costs and improve quality [7][36] - The integration of 3D digital humans into various industries, including gaming and film, is expected to grow, with potential applications in customer service and interactive experiences [1][12] Group 1: Technological Advancements - 3D digital humans have progressed from basic, scripted models to sophisticated entities that can generate voice, expressions, and movements in real-time [1][2] - The introduction of models like Sora2 demonstrates the potential for generating human-like actions and interactions, although challenges remain in error correction and precise control [3][5] - The combination of 2D and 3D training techniques is being explored to enhance the expressiveness and accuracy of digital humans [5][7] Group 2: Cost Efficiency - The cost of creating and deploying 3D digital humans is significantly lower than traditional methods, with estimates suggesting costs are a fraction of those associated with large models [7][36] - AI rendering and calculation techniques have enabled the use of inexpensive hardware to run complex 3D models, making the technology more accessible [36][38] - The ability to generate high-quality 3D content without the need for expensive graphics engines or hardware is a game-changer for the industry [36][39] Group 3: Industry Applications - 3D digital humans are being positioned as the next generation of content producers, with applications in live streaming, customer service, and entertainment [2][12] - The technology is expected to bridge the gap between virtual and physical interactions, enhancing user experiences in various sectors [1][12] - The potential for 3D digital humans to be integrated into VR and AR environments opens new avenues for immersive experiences [5][12] Group 4: Data and Model Development - The accumulation of high-quality 3D animation data is crucial for training effective AI models, with the company claiming to have over 1,000 hours of such data [24][25] - The integration of video data with 3D data is being pursued to improve model training and enhance the realism of digital human interactions [25][26] - The development of a "3D action model" is underway, which aims to automate the generation of 3D motion data for robots, further bridging the gap between digital and physical realms [46][48] Group 5: Future Prospects - The company aims to transition from a 3D digital human provider to a platform that enables other developers to create applications using their technology [12][13] - The potential for 3D digital humans to impact the film industry is acknowledged, although challenges in achieving the high-quality standards of Hollywood remain [33][34] - The ongoing evolution of AI and robotics suggests a promising future for the integration of 3D digital humans in various applications, with expectations for significant advancements in the coming years [57][59]
AI行情启动:这些细分赛道值得关注
Mei Ri Jing Ji Xin Wen· 2025-10-23 01:39
Group 1 - The global AI development has accelerated significantly this year, particularly in China and North America, with the recent release of GPT-5, which shows notable performance improvements and a rapid decrease in token prices, enhancing cost-effectiveness [1] - Advanced models have been released both domestically and internationally, such as Google's DeepMind's Genie3, which maintains consistency in generated content and better understands physical laws, leading to more logical outputs [1] - The rapid development of foundational models for text-to-image and text-to-video generation is evident, indicating a strong growth trajectory in the AI sector [1] Group 2 - The performance growth of companies this year is primarily driven by hardware, with NVIDIA's new Blackwell architecture products seeing rapid deployment, averaging around 1,000 units per week for the GB200 NVL72 system, translating to over 70,000 GPUs sold weekly [2] - The rapid deployment of GPUs is positively impacting related sectors in the A-share market, such as optical modules and PCBs, which are competitive globally and hold significant market shares [2] - The market size for NVIDIA's Blackwell architecture chips is expected to grow quickly, leading to strong performance growth for related A-share companies, making their upcoming quarterly reports worth monitoring [2][3] Group 3 - Key segments with high growth potential in the A-share market include optical modules, PCBs, and server ODMs, with many companies being global leaders in their fields [3] - The rapid iteration of computing chips, such as NVIDIA's GPUs, is expected to continue, with significant increases in average selling prices (ASP) during each iteration, indicating strong growth momentum in these segments [3] - Emerging fields like liquid cooling electronics and fiber/copper connections also present investment opportunities, suggesting a deeper exploration of these areas for those optimistic about the AI market [3]
锦秋基金领投企业Manifold AI流形空间连获两轮共亿元融资,打造下一代具身智能世界模型|Jinqiu Spotlight
锦秋集· 2025-10-20 12:18
Core Insights - Jinqiu Fund has completed an investment in Manifold AI, focusing on world models and embodied intelligence, with a total of over 100 million yuan raised in two funding rounds [2][4] - Jinqiu Fund emphasizes a long-term investment philosophy, seeking groundbreaking technologies and innovative business models in the field of general artificial intelligence [3][16] Investment Overview - The recent angel round of financing for Manifold AI was led by Jinqiu Fund, with participation from co-investors including Chuangweiye and existing shareholder Inno Angel Fund [4] - The seed round was led by Inno Angel Fund, with follow-on investment from the Waterwood Tsinghua Alumni Seed Fund [4] Technological Focus - Manifold AI's original embodied world model technology aims to drive the large-scale deployment of robotic brains, addressing the challenges of diverse bodies, limited data, and fragmented applications in general robotics [6][16] - The company utilizes a World Model Action (WMA) approach, leveraging vast amounts of ego-centric video data for pre-training, which is expected to enhance physical space intelligence emergence [10][16] Industry Context - The rapid evolution of robotics and the need for autonomous operational capabilities are critical for large-scale implementation [6] - The shift in technology strategies by companies like Tesla and Figure AI towards using extensive ego-centric video data for training reflects a broader trend in the industry [6][7] Team and Leadership - Manifold AI's core team is based in Beijing, with members having backgrounds in robotics and large models, and experience in developing AI products with millions of users [12] - The founder and CEO, Dr. Wu Wei, has extensive management experience and previously led the development of the world model at SenseTime [13][16] Future Outlook - Jinqiu Fund anticipates exploring the next generation of embodied intelligent world models in collaboration with Manifold AI, as the industry moves towards a deeper understanding of machine interaction with the world [17]
“AI教母”,公布最新世界模型
财联社· 2025-10-17 12:28
Group 1 - The article discusses the launch of a new real-time interactive 3D world model called RTFM (Real-Time Frame Model) developed by World Labs, founded by AI expert Fei-Fei Li. The model is designed around three key principles: efficiency, scalability, and durability, allowing it to run on a single H100 GPU to render persistent and consistent 3D worlds [2] - World Labs emphasizes that as world model technology advances, the demand for computing power will increase significantly, surpassing the current requirements of large language models (LLMs). To achieve 4K+60FPS interactive video streaming, traditional video architectures need to generate over 100,000 tokens per second, which is economically unfeasible with current computing infrastructure [2] - The article highlights a strategic partnership between OpenAI and Broadcom to deploy a 10-gigawatt AI accelerator, which is expected to create a diversified computing power system for OpenAI, reducing reliance on a single supplier and driving down computing costs through competition [3] Group 2 - The phenomenon known as "Jevons Paradox" is noted, where advancements in AI model technology that improve computing efficiency can lead to an overall increase in the total consumption of computing resources. For instance, the DeepSeek R1 model, released earlier this year, demonstrates strong AI performance but is expected to increase the demand for computing resources [4] - World Labs previously released the Marble model, which generates 3D worlds from a single image or text prompt, showcasing improved geometric structures and diverse styles compared to its predecessor. Fei-Fei Li has stated that the significance of world models lies in their ability to understand and reason about both textual information and the physical world's operational laws [4] - Companies across the AI and terminal sectors are increasingly investing in world models, with xAI hiring experts from NVIDIA and competitors like Meta and Google also focusing on this area. In China, robotics firms such as Yushu and Zhiyuan have open-sourced their world models [4] Group 3 - Dongwu Securities notes that as computing power becomes cheaper and more accessible, developers will set more complex models and systems as new benchmarks, increasing parameters, context, and parallelism. While model architecture iterations may reduce the computing power required for single inference and training, models like Genie3 that generate videos may require a significant increase in computing power to meet demands [5] - The higher ceiling for AI computing power and improved competitive landscape are expected to support a higher valuation framework for AI computing compared to 4G/5G, along with a stronger Beta [5]
Sim2Real,解不了具身智能的数据困境。
自动驾驶之心· 2025-10-03 03:32
Core Viewpoint - The article discusses the ongoing debate in the field of embodied intelligence regarding the reliance on simulation efficiency versus real-world data, and the potential of world models to redefine the landscape of data utilization in this domain [4][8]. Group 1: Understanding Sim-to-Real Gap - The "Sim-to-Real gap" refers to the discrepancies between simulated environments and real-world scenarios, primarily due to incomplete simulations that fail to accurately replicate visual and physical details [8]. - Research indicates that the gap exists because simulation models do not fully capture the complexities of the real world, leading to limited generalization capabilities and a focus on specific scenarios [8][11]. - Solutions to bridge this gap involve optimizing data, including designing virtual and real data ratios and leveraging AIGC to generate diverse datasets that balance volume and authenticity [11][12]. Group 2: Data Utilization in Embodied Intelligence - There is a consensus among experts that while real data is ideal for training, the current landscape necessitates a reliance on simulation data due to the scarcity of high-quality real-world datasets in the embodied intelligence field [20][21]. - Simulation data plays a crucial role in foundational model iteration and testing, as it allows for safe and efficient algorithm testing before deploying on real machines [21][24]. - The potential of simulation in scaling reinforcement learning is highlighted, as well-constructed simulators can facilitate large-scale parallel training, enabling models to learn from scenarios that are difficult to capture in real life [24][26]. Group 3: World Models and Future Directions - The article emphasizes the significance of world models in future research, particularly in areas like autonomous driving and embodied intelligence, showcasing their potential in general visual understanding and long-term planning [30][32]. - Challenges remain in automating the generation of simulation data and ensuring the diversity and generalization of actions within simulations, which are critical for advancing the field [28][29]. - The introduction of new modalities, such as force and touch, into world models is suggested as a promising direction for future research, despite current limitations in computational resources [30][31]. Group 4: Reaction to Boston Dynamics Technology - Experts acknowledge the advanced capabilities of Boston Dynamics robots, particularly their smooth execution of complex tasks that require sophisticated motion control [33][37]. - The discussion highlights the importance of hardware and data in the field of embodied intelligence, with Boston Dynamics' approach serving as a benchmark for future developments [37][39]. - The consensus is that the seamless performance of these robots is attributed not only to hardware differences but also to superior motion control techniques that could inform future research in embodied intelligence [39][41].
Sim,Real还是World Model?具身智能数据的“困境”与解法
具身智能之心· 2025-10-01 12:48
Core Viewpoint - The article discusses the ongoing debate in the field of embodied intelligence regarding the reliance on simulation efficiency versus real-world data, and the potential of world models to bridge the gap between these two approaches [2]. Group 1: Understanding Sim-to-Real Gap - The "Sim-to-Real gap" refers to the discrepancies between simulated environments and real-world scenarios, primarily due to incomplete simulations that fail to accurately replicate visual and physical details [3]. - Key factors contributing to this gap include limited simulation data, which weakens model generalization and restricts adaptability to specific scenarios [3]. - To narrow this gap, optimization around data is essential, including designing virtual and real data ratios based on model requirements and leveraging AIGC to generate diverse and realistic data [3]. Group 2: Data Utilization in Embodied Intelligence - There is a consensus among experts that while real data is ideal for training, simulation data plays a crucial role in the foundational model iteration and testing phases [15][18]. - Real data is often limited in the field of embodied intelligence, making it challenging to meet the high expectations for diverse task performance [15]. - Simulation data is currently seen as a necessary resource, especially for testing algorithms and avoiding potential damages in real-world experiments [15][18]. Group 3: Future Directions and Challenges - The development of world models is viewed as a promising direction for the future of embodied intelligence, with potential applications in autonomous driving and other areas [25]. - Key challenges include the need for automated generation of simulation data and enhancing the diversity of actions within simulation environments [21][23]. - The integration of new modalities, such as force and touch, into world models is suggested as a valuable research direction [23]. Group 4: Reaction to Boston Dynamics Technology - Experts acknowledge the advanced capabilities of Boston Dynamics robots, particularly their smooth execution of complex tasks involving full-body movements [26][30]. - The discussion highlights the importance of hardware and data in achieving high performance in embodied intelligence systems, with Boston Dynamics setting a benchmark in the field [30]. - The need for further exploration in motion control techniques to enhance the fluidity of robotic movements is emphasized [32].
1000亿美元重磅投资!通信ETF(515880)开盘大涨超4%,光模块占比50%
Sou Hu Cai Jing· 2025-09-23 02:07
Group 1 - Nvidia plans to invest up to $100 billion in OpenAI and provide data center chips, with the first $10 billion investment starting after a final agreement on chip procurement [4][5] - The partnership aims to deploy at least 10 GW of Nvidia chips for OpenAI's AI infrastructure, with the first deployment phase expected to be operational by the second half of 2026 [4][6] - The investment and chip supply agreement is part of a broader trend of tech giants collaborating in the AI space, with significant capital expenditures from major cloud providers [5][9] Group 2 - The communication ETF (515880) has gained over 105% year-to-date, making it the top-performing ETF in the A-share market, with a total size exceeding 11.7 billion yuan [3][11] - The ETF's composition includes 50% in optical modules, reflecting strong demand in the AI and data center sectors [3][11] - The AI industry is experiencing rapid growth, with North American cloud providers' capital expenditures reaching $95.8 billion in Q2 2025, a 64% year-on-year increase [6][9]
谷歌为什么又行了 ?
3 6 Ke· 2025-09-06 23:40
Group 1 - Apple is restarting its collaboration with Google, considering using Gemini to support the revamped Siri, expected to launch in 2026 [1] - The partnership could significantly enhance Google's AI technology by providing access to millions of iPhone users, marking a milestone in its influence [1][2] - Gemini has made substantial progress in performance and user numbers over the past year, positioning itself among the top models in the LLM arena [2][10] Group 2 - Gemini's website traffic surged from 284 million visits in February to 700 million in July, while ChatGPT received 5.72 billion visits [6] - As of July 2025, Gemini reached 450 million monthly active users, a notable increase from 400 million in May [7] - Gemini 2.5 Pro achieved the highest IQ ranking in AI, indicating its advanced capabilities in logic reasoning and complex task handling [10][12] Group 3 - Google's Gemini is ranked second in website traffic, attracting about 12% of ChatGPT's traffic, with a significant user base on mobile [5] - The introduction of the "Nano Banana" model has revolutionized the AI image generation space, showcasing superior image quality and user-friendly operations [13][15] - The video AI model Veo3 has gained acclaim for its high-quality video generation, becoming a practical tool for professional production processes [19][21] Group 4 - Google's TPU has become the world's most advanced AI chip, designed specifically for AI tasks, ensuring the company is not facing power supply anxiety [27][29] - The integration of AI capabilities into Google's existing platforms, such as Chrome and Android, allows for rapid deployment and optimization based on user data [31] - Google's talent acquisition strategy includes offering competitive salaries and optimizing organizational structures to enhance AI application development [34][35]
特斯拉Optimus:世界模型会终结一切
自动驾驶之心· 2025-09-03 23:33
Core Viewpoint - Tesla has shifted from imitation learning to video learning and is now focusing on developing a world model as the ultimate solution for its Optimus robot, which will enable it to understand and interact with the physical world like a child learns about its environment [5][12][17]. Group 1: Learning Approaches - Imitation learning achieved end-to-end processing but faced issues with data generalization [6]. - Video learning addresses data diversity but struggles with scale and cost [6]. - The world model is proposed as a solution that encompasses physical knowledge of the real world, allowing robots to learn autonomously [6][12]. Group 2: World Model Development - The world model is a large-scale model that learns from real-world videos, understanding physical laws such as gravity and material properties [6][12]. - Google's Genie3 is highlighted as an example of a world model that creates an interactive 3D physical environment, allowing users to engage with it [9][11]. Group 3: Application to Robotics - The Optimus robot will utilize a small amount of real-world video to fine-tune its understanding of physical laws and its own mechanics [12][14]. - Engineers can generate vast amounts of realistic simulation videos based on simple natural language commands, which can then be used to train the robot's AI efficiently [14][16]. - This method allows for near-zero-cost and zero-risk trial-and-error learning in virtual environments, significantly enhancing the robot's robustness and adaptability [16]. Group 4: Industry Context - Many companies in the autonomous driving sector have not yet achieved end-to-end solutions and are still in the earlier stages of data collection and imitation learning [17]. - The article emphasizes the long journey ahead for Tesla's Optimus robot to fully realize the potential of the world model, contrasting it with the current state of many domestic humanoid robot companies [17].