Forge渲染器

Search documents
烧钱一年,李飞飞的「空间智能」愿景有变化吗?
机器之心· 2025-06-13 12:02
Group 1 - The core vision of World Labs, founded by Fei-Fei Li, emphasizes the importance of spatial intelligence and world models in AI development, aiming to create AI systems that can understand and generate 3D physical worlds [5][6][7] - World Labs has achieved significant milestones in its first year, including raising $230 million in funding and reaching a valuation of over $1 billion, positioning itself as a notable player in the AI sector [5][6] - The company has released technologies such as the "world generation" model and the Forge renderer, which facilitate the creation of interactive 3D environments from single images [6][7] Group 2 - Fei-Fei Li argues that current language models (LLMs) have limitations in describing and understanding 3D physical worlds, making spatial intelligence a crucial component for AI [5][6] - The success of LLMs has provided methodologies for spatial intelligence, but true breakthroughs require interdisciplinary integration, particularly between AI and computer graphics [7][8] - The advancements in computational power, data availability, and engineering capabilities have made the pursuit of "world models" a realistic goal [7]
腾讯研究院AI速递 20250604
腾讯研究院· 2025-06-03 14:49
Group 1 - Microsoft launched Bing Video Creator, supported by OpenAI's Sora technology, allowing users to generate various types of videos through natural language [1] - The service is free and offers two generation modes: quick and standard, with an initial allowance of 10 quick generation opportunities, producing videos of 5 seconds in length [1] - Built-in safety measures are included to prevent misuse, and each generated video is tagged with content credentials and traceability information; currently, it is not available in the national region [1] Group 2 - Manus introduced a new slide feature that can generate 8 professional PPT slides in 10 minutes, receiving positive feedback [2] - The testing process showed that Manus can automatically search for information, plan structure, and generate content, supporting instant modifications and various export formats, although there are issues with incomplete page displays [2] - Compared to Genspark, Manus is faster (10 minutes vs. 20 minutes) and more powerful, being rated as the best PPT creation tool currently [2] Group 3 - Character.ai launched AvatarFX, enabling static images to speak, sing, and interact with users [3] - AvatarFX is based on the DiT architecture, featuring high fidelity and strong temporal consistency, maintaining stability even in complex scenarios with multiple characters and long sequences [3] - Character.ai also introduced several AI creation features, including immersive narrative experiences and animated chat, while facing an antitrust investigation regarding Google's acquisition of the platform [3] Group 4 - Fellou 2.0 was officially released, functioning as an intelligent agent similar to "Jarvis," enabling 24/7 batch production of AI tasks [4][5] - The new version boasts improved speed (1.2-1.5 times faster), enhanced capabilities (supporting diverse delivery), and increased reliability (success rate improved from 31% to 80%) [5] - Built on the new Eko 2.0 architecture, it supports parallel processing of multiple tasks and plans to release a Windows version while continuously optimizing user experience and model intelligence [5] Group 5 - YouWare is an "ambient programming" platform designed for creators in the AI era, allowing non-programmers to convert ideas into web pages and share them online [6] - The platform's core advantage lies in its "what you see is what you think" experience, where users describe their ideas, and AI generates code for immediate visualization and sharing [6] - YouWare is supported by self-developed AI Agent and Sandbox technology, creating a community similar to "Instagram" and implementing a "Knot" reward mechanism to encourage quality content creation [6] Group 6 - Zhiyuan Research Institute open-sourced the lightweight long video understanding model Video-XL-2, capable of efficiently processing video inputs of up to ten thousand frames on a single card [7] - The model consists of a visual encoder, dynamic token synthesis module, and a large language model, employing a four-stage progressive training method and introducing a segmented pre-filling strategy [7] - Video-XL-2 outperforms all lightweight open-source models on mainstream evaluation benchmarks, encoding 2048 frames of video in just 12 seconds, applicable in film content analysis and anomaly behavior monitoring [7] Group 7 - Salesforce, the leading global CRM platform, acquired the AI Agent platform Moonhub, with the entire team joining Salesforce to develop the Agentforce platform [8] - Salesforce CEO Marc Benioff is optimistic about the development of intelligent agents, aiming to create one billion agents through Agentforce by the end of 2025, with 3,000 paying customers already onboard [8] - Moonhub specializes in recruiting intelligent agents, autonomously searching and screening candidates, complementing Salesforce's existing HR intelligent agent functions and enhancing its influence in the intelligent agent sector [8] Group 8 - Li Feifei's World Labs open-sourced the Forge renderer, enabling real-time rendering of AI-generated 3D worlds on ordinary devices [10] - Forge is a web-based 3D Gaussian splat (3DGS) renderer, seamlessly integrating with three.js, supporting multiple splat objects, cameras, and real-time animation/editing [10] - The technology's key lies in an efficient painter's algorithm for sorting issues and a programmable data pipeline, allowing developers to handle AI-generated 3D worlds as easily as processing triangular meshes [10] Group 9 - The report discusses the model selection guide by Kapasi, recommending GPT-4o for simple daily questions and switching to o3 for complex tasks [11] - Specific usage scenarios include 40% for simple daily questions with 4o, 40% for complex important issues with o3, and using GPT-4.1 for code refinement [11] - The core principle for model selection is "either-or": first determine if the task is important and if one is willing to wait (choose o3) or if it is unimportant and needs quick understanding (choose 4o) [11] Group 10 - ChatGPT's memory system consists of two main components: saving memories and chat history, which is further divided into current session history, dialogue history, and user insights [12] - The technical implementation of memory saving is achieved through bio tools, while dialogue history utilizes vector space to establish multi-layer indexing [12] - The user experience is significantly enhanced by the memory mechanism, particularly the user insight system, which may contribute over 80% to ChatGPT's improved understanding, transforming it from "you tell me" to "I can see" [12]
李飞飞空间智能独角兽开源底层技术!AI生成3D世界在所有设备流畅运行空间智能的“着色器”来了
量子位· 2025-06-03 04:26
Core Viewpoint - World Labs, co-founded by Fei-Fei Li, has open-sourced a core technology called Forge, a real-time 3D Gaussian Splatting renderer that operates seamlessly across various devices, including desktops, low-power mobile devices, and XR [1][6]. Group 1: Technology Overview - Forge is a web-based 3D Gaussian Splatting renderer that integrates with three.js, enabling fully dynamic and programmable Gaussian splatting [2]. - The underlying design of Forge is optimized for GPU, serving a role similar to traditional 3D graphics components known as "shaders" [3]. - The technology allows developers to handle AI-generated 3D worlds as easily as manipulating triangle meshes, according to Ben Mildenhall, co-founder of World Labs [5]. Group 2: Features and Capabilities - Forge requires minimal code to start and run, supporting multiple splat objects, cameras, and real-time animations/edits [4]. - It is designed as a programmable 3D Gaussian Splatting engine, providing unprecedented control over the generation, animation, and rendering of 3D Gaussian splats [8]. - The renderer employs a painter's algorithm for sorting splats, which is a core aspect of its design [13]. Group 3: Rendering Process - The key component managing the rendering process is ForgeRenderer, which compiles a complete list of splats in a three.js scene and determines the drawing order using an efficient bucket sort algorithm [14]. - Forge supports multi-view rendering by generating additional ForgeViewpoint objects, allowing for simultaneous rendering from different perspectives [15]. Group 4: Future Plans - World Labs aims to elevate multimodal AI from 2D pixel planes to full 3D worlds, with plans to launch its first product in 2025 [17]. - The company intends to develop tools beneficial for professionals such as artists, designers, developers, filmmakers, and engineers, targeting a wide range of customers from video game developers to film studios [17].