Genie3
Search documents
站在内容创作者与机器人的交界处:聊聊3D数字人的进化
3 6 Ke· 2025-10-29 11:24
Core Insights - The rise of 3D digital humans is transforming content creation and interaction, moving from rigid, scripted avatars to dynamic, responsive entities capable of real-time expression and movement [1][2] - The technology behind 3D digital humans is evolving, with significant advancements in AI and rendering techniques that reduce costs and improve quality [7][36] - The integration of 3D digital humans into various industries, including gaming and film, is expected to grow, with potential applications in customer service and interactive experiences [1][12] Group 1: Technological Advancements - 3D digital humans have progressed from basic, scripted models to sophisticated entities that can generate voice, expressions, and movements in real-time [1][2] - The introduction of models like Sora2 demonstrates the potential for generating human-like actions and interactions, although challenges remain in error correction and precise control [3][5] - The combination of 2D and 3D training techniques is being explored to enhance the expressiveness and accuracy of digital humans [5][7] Group 2: Cost Efficiency - The cost of creating and deploying 3D digital humans is significantly lower than traditional methods, with estimates suggesting costs are a fraction of those associated with large models [7][36] - AI rendering and calculation techniques have enabled the use of inexpensive hardware to run complex 3D models, making the technology more accessible [36][38] - The ability to generate high-quality 3D content without the need for expensive graphics engines or hardware is a game-changer for the industry [36][39] Group 3: Industry Applications - 3D digital humans are being positioned as the next generation of content producers, with applications in live streaming, customer service, and entertainment [2][12] - The technology is expected to bridge the gap between virtual and physical interactions, enhancing user experiences in various sectors [1][12] - The potential for 3D digital humans to be integrated into VR and AR environments opens new avenues for immersive experiences [5][12] Group 4: Data and Model Development - The accumulation of high-quality 3D animation data is crucial for training effective AI models, with the company claiming to have over 1,000 hours of such data [24][25] - The integration of video data with 3D data is being pursued to improve model training and enhance the realism of digital human interactions [25][26] - The development of a "3D action model" is underway, which aims to automate the generation of 3D motion data for robots, further bridging the gap between digital and physical realms [46][48] Group 5: Future Prospects - The company aims to transition from a 3D digital human provider to a platform that enables other developers to create applications using their technology [12][13] - The potential for 3D digital humans to impact the film industry is acknowledged, although challenges in achieving the high-quality standards of Hollywood remain [33][34] - The ongoing evolution of AI and robotics suggests a promising future for the integration of 3D digital humans in various applications, with expectations for significant advancements in the coming years [57][59]
AI行情启动:这些细分赛道值得关注
Mei Ri Jing Ji Xin Wen· 2025-10-23 01:39
从今年看,全球范围内AI的催化明显,发展也有显著加速。尤其国内及北美,近期GPT-5已发布。 GPT-5虽可能有多个模型,目前市场对其评价不一,但总体而言,其性能有明显提升。同时其token价 格开始迅速下降,性价比会疾速提高。当然性价比提高,本质源于算力硬件及资源的支持,这点后面再 谈。 除GPT-5外,投资者若有关注便知,海外及国内均发布了许多先进模型。例如谷歌旗下DeepMind发布了 世界模型Genie3。该世界模型与以往纯视频生成模型的区别在于:生成模型时能保持良好一致性,同时 可更好理解世界物理规律。因此其生成的图片或视频更符合逻辑,例如不会出现篮球拍下去起不来的情 况。除Genie3外,大家最近也被朋友圈的Sora2刷屏。可见当前基础模型,无论是文生图、文生视频相 关模型,都有较快发展。 今年从各厂商业绩来看,其业绩成长动能主要由硬件驱动。我们知道,今年一季度和二季度,英伟达新 一代Blackwell架构的产品已开始快速放量。英伟达在五月份第一财季法说会上提到,当时GB200的 NVL72机架超大规模客户周均部署量约1000台。这意味着,每个机柜含72张GPU,周均部署1000台即 周度GPU销 ...
锦秋基金领投企业Manifold AI流形空间连获两轮共亿元融资,打造下一代具身智能世界模型|Jinqiu Spotlight
锦秋集· 2025-10-20 12:18
Core Insights - Jinqiu Fund has completed an investment in Manifold AI, focusing on world models and embodied intelligence, with a total of over 100 million yuan raised in two funding rounds [2][4] - Jinqiu Fund emphasizes a long-term investment philosophy, seeking groundbreaking technologies and innovative business models in the field of general artificial intelligence [3][16] Investment Overview - The recent angel round of financing for Manifold AI was led by Jinqiu Fund, with participation from co-investors including Chuangweiye and existing shareholder Inno Angel Fund [4] - The seed round was led by Inno Angel Fund, with follow-on investment from the Waterwood Tsinghua Alumni Seed Fund [4] Technological Focus - Manifold AI's original embodied world model technology aims to drive the large-scale deployment of robotic brains, addressing the challenges of diverse bodies, limited data, and fragmented applications in general robotics [6][16] - The company utilizes a World Model Action (WMA) approach, leveraging vast amounts of ego-centric video data for pre-training, which is expected to enhance physical space intelligence emergence [10][16] Industry Context - The rapid evolution of robotics and the need for autonomous operational capabilities are critical for large-scale implementation [6] - The shift in technology strategies by companies like Tesla and Figure AI towards using extensive ego-centric video data for training reflects a broader trend in the industry [6][7] Team and Leadership - Manifold AI's core team is based in Beijing, with members having backgrounds in robotics and large models, and experience in developing AI products with millions of users [12] - The founder and CEO, Dr. Wu Wei, has extensive management experience and previously led the development of the world model at SenseTime [13][16] Future Outlook - Jinqiu Fund anticipates exploring the next generation of embodied intelligent world models in collaboration with Manifold AI, as the industry moves towards a deeper understanding of machine interaction with the world [17]
“AI教母”,公布最新世界模型
财联社· 2025-10-17 12:28
Group 1 - The article discusses the launch of a new real-time interactive 3D world model called RTFM (Real-Time Frame Model) developed by World Labs, founded by AI expert Fei-Fei Li. The model is designed around three key principles: efficiency, scalability, and durability, allowing it to run on a single H100 GPU to render persistent and consistent 3D worlds [2] - World Labs emphasizes that as world model technology advances, the demand for computing power will increase significantly, surpassing the current requirements of large language models (LLMs). To achieve 4K+60FPS interactive video streaming, traditional video architectures need to generate over 100,000 tokens per second, which is economically unfeasible with current computing infrastructure [2] - The article highlights a strategic partnership between OpenAI and Broadcom to deploy a 10-gigawatt AI accelerator, which is expected to create a diversified computing power system for OpenAI, reducing reliance on a single supplier and driving down computing costs through competition [3] Group 2 - The phenomenon known as "Jevons Paradox" is noted, where advancements in AI model technology that improve computing efficiency can lead to an overall increase in the total consumption of computing resources. For instance, the DeepSeek R1 model, released earlier this year, demonstrates strong AI performance but is expected to increase the demand for computing resources [4] - World Labs previously released the Marble model, which generates 3D worlds from a single image or text prompt, showcasing improved geometric structures and diverse styles compared to its predecessor. Fei-Fei Li has stated that the significance of world models lies in their ability to understand and reason about both textual information and the physical world's operational laws [4] - Companies across the AI and terminal sectors are increasingly investing in world models, with xAI hiring experts from NVIDIA and competitors like Meta and Google also focusing on this area. In China, robotics firms such as Yushu and Zhiyuan have open-sourced their world models [4] Group 3 - Dongwu Securities notes that as computing power becomes cheaper and more accessible, developers will set more complex models and systems as new benchmarks, increasing parameters, context, and parallelism. While model architecture iterations may reduce the computing power required for single inference and training, models like Genie3 that generate videos may require a significant increase in computing power to meet demands [5] - The higher ceiling for AI computing power and improved competitive landscape are expected to support a higher valuation framework for AI computing compared to 4G/5G, along with a stronger Beta [5]
Sim2Real,解不了具身智能的数据困境。
自动驾驶之心· 2025-10-03 03:32
Core Viewpoint - The article discusses the ongoing debate in the field of embodied intelligence regarding the reliance on simulation efficiency versus real-world data, and the potential of world models to redefine the landscape of data utilization in this domain [4][8]. Group 1: Understanding Sim-to-Real Gap - The "Sim-to-Real gap" refers to the discrepancies between simulated environments and real-world scenarios, primarily due to incomplete simulations that fail to accurately replicate visual and physical details [8]. - Research indicates that the gap exists because simulation models do not fully capture the complexities of the real world, leading to limited generalization capabilities and a focus on specific scenarios [8][11]. - Solutions to bridge this gap involve optimizing data, including designing virtual and real data ratios and leveraging AIGC to generate diverse datasets that balance volume and authenticity [11][12]. Group 2: Data Utilization in Embodied Intelligence - There is a consensus among experts that while real data is ideal for training, the current landscape necessitates a reliance on simulation data due to the scarcity of high-quality real-world datasets in the embodied intelligence field [20][21]. - Simulation data plays a crucial role in foundational model iteration and testing, as it allows for safe and efficient algorithm testing before deploying on real machines [21][24]. - The potential of simulation in scaling reinforcement learning is highlighted, as well-constructed simulators can facilitate large-scale parallel training, enabling models to learn from scenarios that are difficult to capture in real life [24][26]. Group 3: World Models and Future Directions - The article emphasizes the significance of world models in future research, particularly in areas like autonomous driving and embodied intelligence, showcasing their potential in general visual understanding and long-term planning [30][32]. - Challenges remain in automating the generation of simulation data and ensuring the diversity and generalization of actions within simulations, which are critical for advancing the field [28][29]. - The introduction of new modalities, such as force and touch, into world models is suggested as a promising direction for future research, despite current limitations in computational resources [30][31]. Group 4: Reaction to Boston Dynamics Technology - Experts acknowledge the advanced capabilities of Boston Dynamics robots, particularly their smooth execution of complex tasks that require sophisticated motion control [33][37]. - The discussion highlights the importance of hardware and data in the field of embodied intelligence, with Boston Dynamics' approach serving as a benchmark for future developments [37][39]. - The consensus is that the seamless performance of these robots is attributed not only to hardware differences but also to superior motion control techniques that could inform future research in embodied intelligence [39][41].
Sim,Real还是World Model?具身智能数据的“困境”与解法
具身智能之心· 2025-10-01 12:48
更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在具身智能的征途上,我们究竟该依赖仿真的效率,还是现实的真实数据,甚或期待世界模型改变游戏规则? 随着物理仿真进入深水区,"仿真派"能否笑到最后? 然而Physical Intelligence (PI)联合创始人、具身智能领域的先行者Sergey Levine始终坚称:替代数据是叉勺(叉子勺子二合一的产物,既不 如勺子,也不如叉子),真实交互数据不可替代——这究竟是策略局限,还是数据本质的铁律?如今,Genie3携世界模型横空出世,能够 从文本生成可交互的动态环境,甚至驱动在线规划。这是否意味着我们正站在"仿真"与"现实"二元对立终结的前夜?世界模型会成为数据 问题的终极答案,还是仅仅换了一种形式的sim,并依然难逃Sim-to-Real gap的宿命? 本场技术圆桌,我们邀请到国内Sim2Real领域四位杰出青年科学家—— 与他们四位共话前沿,从高保真3D资产构建、神经渲染的物理瓶颈、铰链体结构优化,到VLA模型的解耦设计等方面入手深入探讨:具身 智能的数据之路,究竟通向仿真、现实,还是那个正在 ...
1000亿美元重磅投资!通信ETF(515880)开盘大涨超4%,光模块占比50%
Sou Hu Cai Jing· 2025-09-23 02:07
Group 1 - Nvidia plans to invest up to $100 billion in OpenAI and provide data center chips, with the first $10 billion investment starting after a final agreement on chip procurement [4][5] - The partnership aims to deploy at least 10 GW of Nvidia chips for OpenAI's AI infrastructure, with the first deployment phase expected to be operational by the second half of 2026 [4][6] - The investment and chip supply agreement is part of a broader trend of tech giants collaborating in the AI space, with significant capital expenditures from major cloud providers [5][9] Group 2 - The communication ETF (515880) has gained over 105% year-to-date, making it the top-performing ETF in the A-share market, with a total size exceeding 11.7 billion yuan [3][11] - The ETF's composition includes 50% in optical modules, reflecting strong demand in the AI and data center sectors [3][11] - The AI industry is experiencing rapid growth, with North American cloud providers' capital expenditures reaching $95.8 billion in Q2 2025, a 64% year-on-year increase [6][9]
谷歌为什么又行了 ?
3 6 Ke· 2025-09-06 23:40
Group 1 - Apple is restarting its collaboration with Google, considering using Gemini to support the revamped Siri, expected to launch in 2026 [1] - The partnership could significantly enhance Google's AI technology by providing access to millions of iPhone users, marking a milestone in its influence [1][2] - Gemini has made substantial progress in performance and user numbers over the past year, positioning itself among the top models in the LLM arena [2][10] Group 2 - Gemini's website traffic surged from 284 million visits in February to 700 million in July, while ChatGPT received 5.72 billion visits [6] - As of July 2025, Gemini reached 450 million monthly active users, a notable increase from 400 million in May [7] - Gemini 2.5 Pro achieved the highest IQ ranking in AI, indicating its advanced capabilities in logic reasoning and complex task handling [10][12] Group 3 - Google's Gemini is ranked second in website traffic, attracting about 12% of ChatGPT's traffic, with a significant user base on mobile [5] - The introduction of the "Nano Banana" model has revolutionized the AI image generation space, showcasing superior image quality and user-friendly operations [13][15] - The video AI model Veo3 has gained acclaim for its high-quality video generation, becoming a practical tool for professional production processes [19][21] Group 4 - Google's TPU has become the world's most advanced AI chip, designed specifically for AI tasks, ensuring the company is not facing power supply anxiety [27][29] - The integration of AI capabilities into Google's existing platforms, such as Chrome and Android, allows for rapid deployment and optimization based on user data [31] - Google's talent acquisition strategy includes offering competitive salaries and optimizing organizational structures to enhance AI application development [34][35]
特斯拉Optimus:世界模型会终结一切
自动驾驶之心· 2025-09-03 23:33
Core Viewpoint - Tesla has shifted from imitation learning to video learning and is now focusing on developing a world model as the ultimate solution for its Optimus robot, which will enable it to understand and interact with the physical world like a child learns about its environment [5][12][17]. Group 1: Learning Approaches - Imitation learning achieved end-to-end processing but faced issues with data generalization [6]. - Video learning addresses data diversity but struggles with scale and cost [6]. - The world model is proposed as a solution that encompasses physical knowledge of the real world, allowing robots to learn autonomously [6][12]. Group 2: World Model Development - The world model is a large-scale model that learns from real-world videos, understanding physical laws such as gravity and material properties [6][12]. - Google's Genie3 is highlighted as an example of a world model that creates an interactive 3D physical environment, allowing users to engage with it [9][11]. Group 3: Application to Robotics - The Optimus robot will utilize a small amount of real-world video to fine-tune its understanding of physical laws and its own mechanics [12][14]. - Engineers can generate vast amounts of realistic simulation videos based on simple natural language commands, which can then be used to train the robot's AI efficiently [14][16]. - This method allows for near-zero-cost and zero-risk trial-and-error learning in virtual environments, significantly enhancing the robot's robustness and adaptability [16]. Group 4: Industry Context - Many companies in the autonomous driving sector have not yet achieved end-to-end solutions and are still in the earlier stages of data collection and imitation learning [17]. - The article emphasizes the long journey ahead for Tesla's Optimus robot to fully realize the potential of the world model, contrasting it with the current state of many domestic humanoid robot companies [17].
直播分享!“具身数据困境”:仿真技术、真实数据与世界模型的碰撞交融
具身智能之心· 2025-08-29 16:03
Core Viewpoint - The article discusses the intersection of simulation technology, real data, and world models in the context of embodied intelligence, highlighting the ongoing debate about the importance of simulation versus real data and the potential breakthroughs in world modeling [3][11]. Group 1: Roundtable Discussion - The roundtable focuses on the "data dilemma" in embodied intelligence, featuring four young scientists who explore the boundaries between simulation and real interaction, as well as the technological advancements in world models like Genie [3][11]. - Sergey Levine's assertion that real data is irreplaceable is examined, questioning whether this is a strategic choice or an inevitable path in AI evolution [11]. Group 2: Key Participants - Li Hongyang, an assistant professor at the University of Hong Kong, leads the OpenDriveLab and has made significant contributions to end-to-end autonomous driving solutions, including the award-winning UniAD [4]. - Zhao Hao, an assistant professor at Tsinghua University, specializes in computer vision related to robotics and has co-founded over ten startups since 2009 [5]. - Gu Jiayuan, an assistant professor at ShanghaiTech University, focuses on generalizable robotic decision-making models and has received multiple awards for his research [6][7]. - Mu Yao, an assistant professor at Shanghai Jiao Tong University, has published extensively in top conferences and has received numerous academic honors [7].