Workflow
量子位
icon
Search documents
世界模型有了开源基座Emu3.5!拿下多模态SOTA,性能超越Nano Banana
量子位· 2025-10-30 10:31
Core Insights - The article discusses the launch of the latest open-source native multimodal world model, Emu3.5, developed by the Beijing Academy of Artificial Intelligence (BAAI) [1] - Emu3.5 is designed to enhance the understanding of dynamic physical worlds, moving beyond mere visual realism to a deeper comprehension of context and interactions [8][10] Group 1: Model Capabilities - Emu3.5 can perform high-precision tasks such as erasing handwritten marks and generating dynamic 3D environments from a first-person perspective [2][3] - The model excels in generating coherent and logical outputs, simulating dynamic physical worlds, and maintaining spatial consistency during user interactions [11][20] - It can execute complex tasks like organizing a desktop by following a series of instructions, showcasing its ability to understand long-term sequences and spatial relationships [23][24][28] Group 2: Technical Innovations - Emu3.5 operates on a 34 billion parameter framework, utilizing a standard Decoder-only Transformer architecture to handle various tasks including visual storytelling and image editing [31] - The model has been pre-trained on over 10 trillion tokens of multimodal data, primarily sourced from internet videos, allowing it to learn temporal continuity and causal relationships effectively [32] - A powerful visual tokenizer with a vocabulary of 130,000 visual tokens enables high-fidelity image reconstruction at resolutions up to 2K [33] Group 3: Performance and Comparisons - Emu3.5's performance is competitive, matching or surpassing that of Gemini-2.5-Flash-Image in several authoritative benchmarks, particularly in text rendering and multimodal generation tasks [18] - The model's ability to maintain consistency and style across multiple images and instructions is noted as being at the industry's top level [29] Group 4: Future Implications - The open-source nature of Emu3.5 allows global developers and researchers to leverage its capabilities without starting from scratch, potentially transforming various industries [36] - The model's advancements in generating realistic videos and intelligent agents open up vast possibilities for practical applications across different sectors [37]
谷歌营收被Nano Banana带飞!季度首破千亿美元,Gemini APP月活6.5亿
量子位· 2025-10-30 10:31
Core Insights - Google's quarterly revenue has surpassed $100 billion for the first time, reaching $102.3 billion, a year-over-year increase of 16% [12][22] - The AI-driven growth is evident, with Gemini app achieving 650 million monthly active users and processing 7 billion tokens per minute [5][24] - The company's net profit rose to $34.98 billion, a 33% increase compared to the previous year, with an operating margin of 30.5% [12][18] Group 1: Financial Performance - Google's total revenue for Q3 2025 was $102.3 billion, marking a historic milestone [12] - Net income reached $34.98 billion, with earnings per share (EPS) of $2.87, reflecting a 35% year-over-year increase [12][18] - The Google Services segment generated $87.05 billion in revenue, a 14% increase year-over-year, while Google Cloud revenue grew by 34% to $15.16 billion [12][26] Group 2: AI and Product Development - The Gemini AI model has been commercialized, with significant user engagement and processing capabilities [22][23] - Google Workspace has integrated Gemini AI, enhancing productivity tools for enterprise clients [25] - The demand for AI-related services is rising, with Google Cloud's AI product suite driving revenue growth [27] Group 3: Investment and Future Outlook - Google plans to increase its capital expenditure to approximately $91-93 billion for 2025, focusing on AI infrastructure [30][31] - The company is also investing in energy infrastructure, including a partnership to restart a nuclear power plant to support its data centers [32][36] - The tech industry is facing unprecedented energy demands due to the rapid adoption of generative AI, prompting companies to enhance their energy strategies [36]
字节发布通用游戏智能体!5000亿token训练,用鼠标键盘吊打GPT-5!
量子位· 2025-10-30 10:31
Core Insights - The article discusses the development of Game-TARS, a general-purpose game agent created by ByteDance's Seed team, capable of playing various games like Minecraft, Temple Run, and Stardew Valley, and even adapting to unseen 3D web games through zero-shot transfer [3][4][5]. Group 1: Game-TARS Overview - Game-TARS utilizes a unified and scalable keyboard-mouse action space for extensive pre-training across operating systems, web, and simulated environments, leveraging over 500 billion labeled multimodal training data [4][20]. - The agent outperforms existing models such as GPT-5, Gemini-2.5-Pro, and Claude-4-Sonnet in FPS, open-world, and web games [5][29]. Group 2: Innovation and Design - The core innovation of Game-TARS is its ability to operate like a human using keyboard and mouse, rather than executing predefined functions, allowing for more natural interaction with games [6][9]. - Game-TARS focuses on Human Actions, decoupling its action instruction set from specific applications or operating systems, enabling direct alignment with human interaction methods [9][10]. Group 3: Training Process - Unlike traditional game bots, Game-TARS integrates visual perception, strategic reasoning, action execution, and long-term memory into a single visual language model (VLM) [12][13]. - The training process involves a two-phase approach: continuous pre-training and post-training, with over 20,000 hours and approximately 500 billion tokens of game data used for large-scale pre-training [15][20][22]. Group 4: Experimental Validation - The effectiveness of the unified action space and large-scale continuous pre-training was validated through tests in Minecraft, demonstrating improved performance compared to previous expert models [24][28]. - Game-TARS shows significant scalability in both training and inference processes, enhancing its capabilities across various tasks and environments [31][34].
Agnes:不做通用型智能体丨对话全民AI应用平台Agnes AI
量子位· 2025-10-30 08:39
Core Insights - Multi-Agent systems have emerged as a significant trend in the AI field, enhancing the efficiency and effectiveness of AI applications [2][3]. - Agnes AI, a product developed by SapiensAI, has gained traction with over 300 million registered users and 200,000 daily active users within four months of launch [7][6]. Group 1: Agnes AI Features - Agnes AI integrates various functionalities such as Deep Research, Wide Research, AI Design, AI Slides, and AI Sheets, catering to different user needs [8][14]. - Deep Research focuses on in-depth analysis through iterative questioning, while Wide Research utilizes multiple agents to handle large-scale tasks simultaneously [14][16]. - The platform emphasizes user intent understanding and task complexity to optimize the assignment of tasks to agents [15][16]. Group 2: Market Position and User Base - Agnes AI targets young users and professionals, particularly in mobile and web-based work environments, promoting a lightweight approach to productivity [7][41]. - The product aims to replace traditional office tools, offering a free quota for users, which enhances user acquisition and retention [40][56]. - The AI office market is expected to grow significantly, with traditional products facing disruption from AI-native solutions like Agnes [42][44]. Group 3: Competitive Advantages - Agnes AI's multi-agent architecture allows for parallel task execution, improving speed and efficiency compared to single-agent systems [25][27]. - The product's design prioritizes user experience, aiming for rapid response times and high-quality outputs, which are critical in competitive markets [22][36]. - The company focuses on low customer acquisition costs and aims to capture a significant share of users who have yet to engage with AI technologies [50][52]. Group 4: Future Outlook - The AI market is anticipated to evolve rapidly, with Agnes AI positioned to capitalize on the shift towards AI-native applications [42][46]. - The company envisions becoming a leading player in the AI consumer app space, aiming to exceed the capabilities of existing products like ChatGPT and Perplexity [63][64]. - Agnes AI's long-term goal is to enhance accessibility to AI tools globally, particularly in developing regions, thereby expanding its user base [57][66].
让机器人在“想象”中学习世界的模型来了!PI联创课题组&清华陈建宇团队联合出品
量子位· 2025-10-30 08:39
Core Insights - The article discusses the breakthrough of the Ctrl-World model, a controllable generative world model for robot manipulation, developed by a collaboration between Stanford University and Tsinghua University, which significantly enhances robot task performance in simulated environments [4][12]. Group 1: Model Overview - Ctrl-World allows robots to perform task simulations, strategy evaluations, and self-iterations in an "imagination space" [5]. - The model uses zero real machine data, improving instruction-following success rates from 38.7% to 83.4%, with an average improvement of 44.7% [5][49]. - The related paper titled "CTRL-WORLD: A CONTROLLABLE GENERATIVE WORLD MODEL FOR ROBOT MANIPULATION" has been published on arXiv [5]. Group 2: Challenges Addressed - The model addresses two main challenges in robot training: high costs and inefficiencies in strategy evaluation, and the inadequacy of real-world data for strategy iteration [7][9]. - Traditional methods require extensive real-world testing, which is costly and time-consuming, often leading to mechanical failures and high operational costs [8][9]. - Existing models struggle with open-world scenarios, particularly in active interaction with advanced strategies [10]. Group 3: Innovations in Ctrl-World - Ctrl-World introduces three key innovations: multi-view joint prediction, frame-level action control, and pose-conditioned memory retrieval [13][20]. - Multi-view joint prediction reduces hallucination rates by combining third-person and wrist views, enhancing the accuracy of future trajectory generation [16][23]. - Frame-level action control establishes a strong causal relationship between actions and visual outcomes, allowing for centimeter-level precision in simulations [24][29]. - Pose-conditioned memory retrieval ensures long-term consistency in simulations, maintaining coherence over extended periods [31][36]. Group 4: Experimental Validation - Experiments on the DROID robot platform demonstrated that Ctrl-World outperforms traditional models in generating quality, evaluation accuracy, and strategy optimization [38][39]. - The correlation between virtual performance metrics and real-world outcomes was high, with a correlation coefficient of 0.87 for instruction-following rates [41][44]. - The model's ability to adapt to unseen camera layouts and generate coherent multi-view trajectories showcases its generalization capabilities [39]. Group 5: Future Directions - Despite its successes, Ctrl-World has room for improvement, particularly in adapting to complex physical scenarios and reducing sensitivity to initial observations [51][52]. - Future plans include integrating video generation with reinforcement learning for autonomous exploration of optimal strategies and expanding the training dataset to include more complex environments [53].
量子位2025年度榜单冲刺申报中!企业/产品/人物榜正在征集
量子位· 2025-10-30 08:39
Core Viewpoint - The article announces the launch of the "2025 Artificial Intelligence Annual List" to recognize and celebrate individuals, companies, and products that are leading the transformation in the AI industry [1][2]. Group 1: Awards and Categories - The evaluation will focus on three main dimensions: companies, products, and individuals, with five award categories established [2][5]. - The categories include: - 2025 AI Annual Leading Enterprises - 2025 AI Annual Potential Startups - 2025 AI Annual Outstanding Products - 2025 AI Annual Outstanding Solutions - 2025 AI Annual Focus Figures [5][6]. Group 2: Evaluation Criteria - For the Leading Enterprises category, companies must be registered in China or primarily serve the Chinese market, and demonstrate significant achievements in technology innovation, product implementation, and market expansion [9]. - The Potential Startups category will focus on companies with innovative AI solutions that have gained market recognition and show strong growth potential [10]. - The Outstanding Products category will evaluate AI products based on their technological innovation, market impact, and industry leadership [11]. - The Outstanding Solutions category will assess AI solutions based on their innovative applications and effectiveness in driving industry transformation [13][15]. Group 3: Application Process - The application period for the awards runs from now until November 17, 2025, with results to be announced at the MEET2026 Smart Future Conference [20]. - Interested parties can apply by meeting specific criteria related to their company’s influence in the AI sector and their contributions to technology and commercialization [21][22]. Group 4: Conference Details - The MEET2026 Smart Future Conference will focus on themes such as "Symbiosis Without Boundaries, Intelligence to Ignite the Future," gathering leaders from technology, industry, and academia to discuss transformative changes in the AI sector [24][25].
有人说它能做“具身智能时代的苹果”,这家公司凭什么?
量子位· 2025-10-30 06:17
Core Viewpoint - The article highlights the successful launch and rapid sales of the Booster K1, an entry-level embodied development platform, emphasizing its durability, portability, and comprehensive development capabilities, which have led to its orders being sold out shortly after release [1][5][6]. Product Features and Market Position - The Booster K1 has completed multiple rounds of mass production and delivery, with a robust toolchain supporting complex development scenarios [6][9]. - It has been validated in international robotics competitions, demonstrating long-term reliability and performance [7][25]. - The product is designed with 22 degrees of freedom, a height of approximately 95 cm, and a weight of 19.5 kg, ensuring both portability and physical stability [9][10]. Target Audience and Versions - Booster K1 is available in three versions: Geek Edition, Education Edition, and Professional Edition, all supporting secondary development and various control algorithms [10][11]. - The company aims to attract developers, educators, and competition participants, positioning itself as a leader in the embodied intelligence market [8][12]. Ecosystem and Development Support - The company has established a comprehensive support system for developers, including open hardware, a complete software toolchain, and a variety of pre-configured agent applications [12][13]. - The "Sailing Plan" initiative offers free development tools and courses to lower the entry barrier for developers [14]. Educational and Competitive Initiatives - The company is implementing a "Hundred Cities and Ten Thousand Schools" plan to collaborate with numerous educational institutions over the next three years, promoting robotics education globally [18]. - The company has built a complete ecosystem for robotics competitions, leveraging its experience in robot soccer to support event execution and commercialization [18][22]. Strategic Vision and Platform Development - The company envisions the Booster K1 as a core component of a closed-loop ecosystem for teaching, learning, practicing, competing, and application [16][34]. - The strategic direction aims to create a platform akin to an operating system for embodied intelligence, facilitating a collaborative environment for developers [31][33]. Competitive Landscape and Future Outlook - The company draws parallels with successful tech giants like Microsoft and Apple, focusing on building a platform that encourages developer engagement and cross-scenario adaptability [41]. - The rapid delivery and validation of the Booster K1 indicate the establishment of a usable and co-creative system architecture, potentially leading to the development of a true "humanoid operating system" [39][40].
Cursor发布首个编程大模型!代码生成250tokens/秒,强化学习+MoE架构
量子位· 2025-10-30 01:06
Core Insights - Cursor has officially released its first in-house coding model, named Composer, as part of the Cursor 2.0 update [1][2] - Composer is reported to complete complex tasks in just 30 seconds, achieving a speed increase of 400% compared to competitors [3][12] Model Features - The new Cursor 2.0 includes a native browser tool that allows the model to test, debug, and iterate code autonomously until achieving correct results [4] - Voice code generation enables users to convert their thoughts into code without typing [5] - The interface has shifted from a file-centric to an agent-centric model, allowing multiple agents to run simultaneously without interference [6][7] Performance Metrics - Composer generates code at a speed of 250 tokens per second, which is approximately twice as fast as the current leading models like GPT-5 and Claude Sonnet 4.5 [19][20] - The model demonstrates enhanced reasoning and task generalization capabilities, comparable to mid-tier leading models [21] Training Methodology - Composer's performance is attributed to reinforcement learning, which allows the model to learn from real programming tasks rather than static datasets [22][26] - The training process involves the model working directly within a complete codebase, utilizing production-level tools to write, test, and debug code [27][28] Practical Application - Cursor 2.0 is designed to provide a practical AI system that aligns closely with developers' daily workflows, enhancing its usability in real-world scenarios [35][36] - The model has shown emergent behaviors, such as running unit tests and autonomously fixing code format errors [31] Transparency and Model Origin - There are concerns regarding the transparency of Composer's foundational model, with questions about whether it is based on pre-existing models or entirely self-trained [37][40] - Cursor has previously developed an internal model named Cheetah, which was used for testing speed and system integration [42]
Sora连更三大新功能!一键打造IP形象,限时免注册码抢占安卓市场
量子位· 2025-10-30 01:06
Core Insights - Sora has introduced three major new features: Character Cameo, video stitching, and community leaderboard [1][12][13] - The app has temporarily lifted the invitation code requirement in the US, Canada, Japan, and South Korea to facilitate direct registration [2][17] - The motivation behind the limited-time opening is attributed to insufficient computing power [3] Feature Summaries - **Character Cameo**: This upgraded feature allows users to maintain consistency with non-human cameo characters, including pets or animated figures, enhancing user engagement [6][9] - **Video Stitching**: Users can now combine two videos if they find the generated content too short, increasing the versatility of video creation [12] - **Community Leaderboard**: This feature categorizes the most used cameo characters and the most remixed videos, fostering community interaction [13] Market Strategy - The temporary removal of the invitation code requirement coincides with the launch of Sora's Android version, aiming to rapidly expand the user base and capture market share [18] - Initially, Sora employed a viral marketing strategy where each activated account could share four invitation codes, creating significant buzz but also a gray market for codes [15][16]
再创历史!英伟达市值一夜突破5万亿美元,今年涨幅56%,黄仁勋晋升全球第八富豪
量子位· 2025-10-30 01:06
Core Viewpoint - Nvidia has become the first company in history to surpass a market capitalization of $5 trillion, marking a significant milestone in the tech industry [1][10]. Group 1: Market Performance - On October 29, Nvidia's stock price rose by 5.44%, reaching an intraday high of $212.19 per share, and closing at $207.04 per share, resulting in a market cap of $5.03 trillion [2][10]. - Since the beginning of 2025, Nvidia's stock has surged by 56%, showcasing its rapid growth compared to other major companies [5][39]. - Nvidia's market cap now exceeds the combined total of several major tech companies, including AMD, Intel, and TSMC, as well as entire sectors within the S&P 500 [5][6]. Group 2: Growth Trajectory - Nvidia's market value has skyrocketed from $1 trillion to $5 trillion in just two and a half years, a feat unmatched by other tech giants [9][23]. - The company achieved its first $1 trillion valuation in May 2023, followed by $3 trillion in June 2024, and then $4 trillion in just over a year [22][23]. - The growth from $4 trillion to $5 trillion took only three months, highlighting Nvidia's exceptional market performance [23][24]. Group 3: Key Drivers of Growth - The recent surge in Nvidia's market cap is attributed to announcements made during the GTC developer conference, where CEO Jensen Huang unveiled several technological innovations and partnerships [25][39]. - Key highlights from the conference included plans to collaborate with the U.S. Department of Energy to build new supercomputers and the introduction of the Blackwell chip series, which is expected to significantly increase production [26][27][31]. - Nvidia's anticipated revenue from the new products is projected to reach $500 billion by the end of next year, reflecting a major shift in global computing infrastructure towards Nvidia's accelerated computing model [31][33]. Group 4: Competitive Landscape - Nvidia's rapid ascent has created a substantial gap between it and its closest competitor, Microsoft, which has a market cap of $4.03 trillion, and Apple at $4.00 trillion [13][14]. - The company has positioned itself as a key player in the AI boom, with its GPUs being integral to the infrastructure of leading AI companies like OpenAI and Google DeepMind [39][40]. - Nvidia's strategic investments and partnerships, including a potential $100 billion investment in AI data centers, further solidify its leadership in the AI sector [40][41].