Workflow
量子位
icon
Search documents
技能英伟达桌面超算,加入苹果Mac Studio快爆了:推理速度飙升至277%
量子位· 2025-10-17 04:58
Core Viewpoint - EXO Labs has developed a new framework that enhances large model inference speed by combining NVIDIA's DGX Spark with Apple's M3 Ultra, achieving a speedup of up to 2.77 times for model deployment [1][5][18]. Group 1: Technology and Implementation - The framework utilizes a PD (Prefill and Decode) separation approach, where DGX Spark handles the Prefill phase due to its high computational power, while M3 Ultra manages the Decode phase, benefiting from its high memory bandwidth [11][18]. - The Prefill phase's computational demand grows quadratically with prompt length, while the Decode phase is primarily limited by memory bandwidth, making the separation of tasks advantageous [8][11]. - EXO Labs employs a streaming transmission method for KV cache, allowing for overlapping computation and data transfer between the two devices, which minimizes communication costs [16][18]. Group 2: Performance Metrics - The combination of DGX Spark and M3 Ultra results in significant performance improvements: Prefill speed increases to 3.79 times that of M3 Ultra alone, and Decode speed improves to 3.37 times that of DGX Spark [18][19]. - The overall performance metrics show that the combined system reduces total processing time to 2.32 seconds, achieving a speedup of 2.8 times compared to using M3 Ultra alone [19]. Group 3: Industry Context - NVIDIA is also exploring similar PD separation techniques with its upcoming Rubin CPX platform, which will utilize a compute-intensive processor for Prefill and a high-bandwidth memory chip for Decode [20]. - The recent delivery of DGX Spark systems to notable figures in the tech industry indicates a growing interest and investment in advanced AI inference technologies [22]. - Apple's latest M5 chip shows improvements in AI performance, but comparisons suggest that M3 Ultra may hold more value in the current landscape of AI hardware [26][30].
AGI今天起有了量化标准!Bengio牵头定义,当前进度条58%
量子位· 2025-10-17 04:58
Core Viewpoint - The article presents a measurable definition of Artificial General Intelligence (AGI) as an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult, emphasizing the need for comprehensive evaluation across multiple cognitive domains [2][4]. Evaluation Framework - A quantitative method was designed to assess the distance of current AI from AGI, referencing the Cattell-Horn-Carroll (CHC) theory, which breaks down human cognitive abilities into ten independent yet interconnected core cognitive domains [6][8]. - The assessment includes a question bank of over 500 items, with a scoring system where a total score of 100 indicates AGI level, and higher scores reflect closer proximity to AGI [8][9]. Current AI Performance - The evaluation revealed that while AI has made significant progress, it still falls short of AGI, with GPT-4 scoring only 27 and GPT-5 scoring 58, indicating a 115% increase over two years but still below the passing line of 100 [10][11][13]. - Current AI shows strong performance in knowledge, reading and writing, and mathematics, with GPT-5 scoring above 8 in these areas, reflecting its strengths in knowledge retention and symbolic processing [18][21][22]. Cognitive Shortcomings - Significant deficiencies were identified in foundational cognitive areas such as perception, memory, and reasoning, which cannot be compensated for by merely increasing data scale [23][30]. - In the visual and auditory domains, both GPT-4 and GPT-5 performed poorly, with GPT-4 scoring 0 and GPT-5 only achieving minimal recognition capabilities [24][26]. - Long-term memory storage and retrieval were also highlighted as critical weaknesses, with both models unable to demonstrate effective long-term information retention [27][29]. Misleading Capabilities - Some AI models appear to possess multi-tasking abilities but are essentially masking their shortcomings through technical means, such as expanding context windows, which do not equate to true long-term memory [30][32]. - The evaluation framework specifically excludes external tools, focusing solely on the intrinsic cognitive capabilities of AI systems, thereby revealing the limitations of models that rely on external knowledge sources [33][34].
首款国产eSIM手机来了
量子位· 2025-10-17 01:04
Core Viewpoint - OPPO has launched the Find X9 series, featuring the first domestic eSIM smartphone and advanced camera capabilities, including 8K photography and AI functionalities [1][3][25]. Group 1: Product Features - The Find X9 series includes the Pro version, which is the first to support eSIM technology among domestic smartphones [1]. - It is equipped with a 200-megapixel camera and features the world's first 8K full-focus ultra-high-definition photography and 4K Live photo capabilities [3][26]. - The Pro version introduces the Hasselblad 200-megapixel telephoto lens, becoming the first mobile imaging lens to receive Hasselblad optical certification [5][37]. Group 2: AI Functionalities - The Find X9 series incorporates dual physical AI buttons for quick access to AI memory and real-time dialogue features [6][10]. - The "One-Click AI Flash Memory" function allows users to capture key information easily, with upgrades in ColorOS 16 enhancing this capability [12][13]. - The "One-Click Question Screen" feature has evolved into "Real-Scene AI Dialogue," enabling users to interact with the AI by pointing at objects in real-time [23]. Group 3: Pricing and Availability - The standard version of the Find X9 starts at 4399 yuan, while the Pro version starts at 5299 yuan, with sales commencing on the 22nd [6][8]. - The pricing positions the Find X9 competitively against other flagship models from brands like Apple and Xiaomi [42][43].
OpenAI最新业务:找了个黑洞物理科学家
量子位· 2025-10-17 01:04
Core Insights - OpenAI has launched a new research team called OpenAI for Science, focused on developing AI systems to accelerate discoveries in mathematics and physics [1] - The inclusion of physicist Alex Lupsasca, a recipient of the Physics New Horizons Award, highlights the transformative potential of AI in scientific research, particularly with the advent of GPT-5 Pro [2][5] - GPT-5 Pro demonstrated its capability by solving complex problems in significantly less time than human researchers, indicating a paradigm shift in scientific methodologies [4][10] Group 1 - Alex Lupsasca initially believed that AI would take a long time to reach the forefront of research, but the emergence of GPT-5 Pro changed his perspective [2] - Lupsasca found that GPT-5 Pro could solve the precise form of a new symmetry in black hole perturbation theory in just 30 minutes, a task that took him several days [4][10] - The AI's ability to derive complex equations and provide structured reasoning impressed Lupsasca, leading him to believe in AI's potential to revolutionize scientific research [5][19] Group 2 - Lupsasca's previous work included the Black Hole Explorer (BHEX) project, aimed at sending a satellite into orbit to capture high-resolution images of black holes [28][29] - The BHEX project is set to launch in 2032 and is expected to advance black hole research into a new era of precision [29][30] - Lupsasca has received multiple accolades for his contributions to black hole imaging, including the IUPAP Young Scientist Award in 2024 [30][31]
李飞飞发布全新世界模型,单GPU就能跑!
量子位· 2025-10-17 01:04
Core Insights - The article discusses the launch of a new model called RTFM (A Real-Time Frame Model) by Fei-Fei Li, which operates in real-time, has persistence, and maintains 3D consistency, all while being able to run on a single H100 GPU [1][2]. Group 1: Model Features - RTFM is designed with three core principles: efficiency, scalability, and persistence. It can perform real-time inference at interactive frame rates using only one H100 GPU [2]. - The model is capable of continuous interaction with users, allowing all scenes to be permanently stored, thus creating a persistent 3D world that does not disappear with changes in perspective [3]. Group 2: Computational Requirements - Powerful world models require significant computational resources to reconstruct, generate, and simulate persistent, interactive, and physically accurate environments, which could revolutionize various industries from media to robotics [5]. - The demand for computational power in generative world modeling is expected to exceed that of current large language models, with the need to generate over 100,000 tokens per second for 60 frames of 4K interactive video [7][8]. Group 3: Design Philosophy - The team believes that methods that can elegantly scale with increasing computational power will dominate the AI field, benefiting from the exponential decrease in computing costs over decades [9]. - The goal was to create a highly efficient generative world model that can be deployed immediately and can scale with increased computational power, all while being driven by a single H100 GPU [10]. Group 4: Learning Renderer - RTFM employs a novel approach by using a single neural network to generate 2D images from one or more input images without relying on explicit 3D representations [12]. - The model utilizes an autoregressive diffusion transformer architecture trained on vast amounts of video data, allowing it to predict subsequent frames based on historical data [13]. Group 5: Memory and Persistence - RTFM addresses the challenge of persistence by modeling each frame with a pose in 3D space, allowing the generation of new frames based on the provided pose [18]. - The model's memory structure is spatially organized, enabling it to maintain a persistent memory of the world without explicitly predicting the 3D geometry of objects [19]. - The technique of context juggling allows RTFM to maintain long-term memory of large worlds during extended interactions without the need for extensive computational resources [20].
Veo3.1和Sora2同题竞技来了
量子位· 2025-10-16 09:34
Core Viewpoint - Google has released Veo3.1, which competes directly with Sora2, emphasizing enhanced creative control and audio generation capabilities [1][3][5]. Group 1: Features and Improvements - Veo3.1 introduces significant improvements in creative control, allowing for deeper understanding of commands and more realistic texture capture [2][7]. - The update includes audio generation, enhancing the integration of audio with video content [3][11]. - Key functionalities include "component to video," "frame to video," and "scene extension," enabling users to create more complex narratives and maintain consistency in character actions [11][12][13][14]. Group 2: Performance Comparison - In a direct comparison, Veo3.1 demonstrates superior visual realism and audio effects compared to Sora2, particularly in generating detailed vehicle movements and sound effects [20][21]. - Users have noted that while Sora2 excels in character positioning and storytelling, Veo3.1 outperforms in text-to-video generation [28][29]. - Overall, both models have their strengths and weaknesses, with Veo3.1 focusing on physical realism and Sora2 prioritizing entertainment value [30][31].
黄仁勋长女直播亮相,聊了具身智能
量子位· 2025-10-16 09:30
时令 发自 凹非寺 量子位 | 公众号 QbitAI 黄仁勋大家都见得多了,但你见过他女儿讲具身智能吗? 这不,黄仁勋女儿 Madison Huang 首次公开亮相直播访谈节目,作为英伟达Omniverse与物理AI高级总监,与光轮智能CEO谢晨,以及光 轮智能增长负责人穆斯塔法一起,对"如何缩小机器人在虚拟与现实之间的差距"展开深刻探讨。 在一个半小时的访谈时间内,三人提出了一系列重要观点: 下面具体来看。 利用合成数据和仿真来解决机器人数据障碍 访谈一正式开始,主持人Edmar Mendizabal(Omniverse社区经理)就开门见山抛出了一个许多人都很好奇的问题。 英伟达与光轮智能的合作关系是如何开始的? Madison解答道, 英伟达内部很多项目都依赖于光轮智能的支持 。例如,Gear Lab正在构建通用智能体模型,西雅图机器人实验室正在开 展大量涉及接触操作和精密装配的任务。 合成数据对于解决机器人数据困境至关重要 。 光轮智能的SimReady资产不仅要视觉准确,更重要的是物理准确。 英伟达和光轮智能正在共同开发Isaac Lab Arena——一个用于基准测试、评估、数据收集和大规模强化学习 ...
神经网络与符号系统大一统!华盛顿大学教授把AI逻辑统一成了张量表示
量子位· 2025-10-16 09:30
Core Viewpoint - The current programming languages used in the AI field are fundamentally flawed, and a new unified language called Tensor Logic is proposed to bridge the gap between logic reasoning and neural computation [1][10][18]. Group 1: Critique of Current AI Programming Languages - Pedro Domingos criticizes existing AI programming languages, particularly Python, stating it was "never designed for AI" and lacks support for automated reasoning and knowledge acquisition [11][12]. - Other languages like LISP and Prolog, while enabling symbolic AI, suffer from scalability issues and lack learning support [15]. - The attempt to combine deep learning with symbolic AI in neural-symbolic AI is deemed a poor integration of both approaches [16][17]. Group 2: Introduction of Tensor Logic - Tensor Logic aims to provide a unified framework for expressing neural networks and symbolic reasoning, allowing learning, reasoning, and knowledge representation to unfold within the same mathematical framework [18][19]. - The equivalence between logical rules and tensor operations suggests that traditional symbolic reasoning can be transformed into tensor computations, eliminating the need for specialized logic engines [21]. Group 3: Implementation of Tensor Logic - Tensor Logic utilizes tensor equations to represent various AI methods, including neural networks, symbolic AI, kernel methods, and probabilistic graphical models [33][40]. - Each statement in Tensor Logic is a tensor equation, facilitating automatic differentiation and eliminating the distinction between program structure and model structure [28][25]. - The language allows for a continuous transition from precise reasoning to fuzzy analogy by adjusting the temperature parameter of activation functions, balancing logical reliability and neural network generalization [31].
刚刚,一家具身智能明星公司原地解散了
量子位· 2025-10-16 07:53
Core Viewpoint - The sudden dissolution of OneStar Robotics, a star startup in embodied intelligence, has raised eyebrows in the industry, especially given its recent high-profile funding and recruitment of a renowned CTO [2][3][4]. Company Overview - OneStar Robotics was founded on May 9, 2025, by Li Xingxing, the son of Geely's founder Li Shufu, and was positioned as a key player in the robotics sector for Geely [5][9][10]. - The company aimed to innovate in the "embodied intelligence" field, focusing on practical applications rather than just algorithmic demonstrations [12][13]. Recent Developments - In July, OneStar Robotics announced the completion of a multi-hundred million yuan "friends and family" funding round, primarily from Geely's ecosystem [15]. - The company appointed Ding Yan, a prominent researcher from Shanghai AI Lab, as CTO and co-founder, enhancing its technical capabilities [16]. - In August, a partnership was established with Fudan University to create a joint laboratory, and the first product, "Star Wheel 1," was launched [17]. - Another funding round occurred on September 17, with participation from various market and industry investors, totaling several hundred million yuan [18]. Dissolution Details - Despite its rapid growth and significant investments, OneStar Robotics has reportedly dissolved its team within just five months of its establishment, with many employees not even completing their probation period [22]. - The reasons for the dissolution remain unclear, but there are indications that the existing platform and business may return to Geely, while the technology team could pursue independent ventures [8][7].
多模态大模型首次实现像素级推理!3B参数超越72B传统模型,NeurIPS 2025收录
量子位· 2025-10-16 06:11
UniPixel团队 投稿 量子位 | 公众号 QbitAI 多模态大模型 首次 实现像素级推理,指代、分割、推理三大任务一网打尽! AI"看图说话"现在已经so easy,但即使是GPT-5、Gemini 2.5 Pro,也只能"看个大概",难以进行更精确的目标识别和推理。 对此,来自香港理工大学和腾讯ARC Lab的研究团队提出了首个统一的 像素级 多模态大模型—— UniPixel 。 话不多说,先来康康UniPixel的效果: 只需UniPixel一个模型,就能完成 目标指代 (Referring) 、 像素级分割 (Segmentation) 与 区域推理 (Reasoning) 三大任务,兼 具灵活性、精确性与可扩展性。 目前该论文已被NeurIPS 2025接收,而且代码、数据、Demo 全开源 ! 下面是更多详细信息。 UniPixel重新定义视觉推理 传统的视觉问答或描述系统,多数基于整体的图像或视频信息进行推理,缺乏对图中"具体区域"或"指定目标"的精确感知。 这不仅限制了其在医疗诊断、自动驾驶、人机交互等场景中的实际应用,也难以满足用户对"可控性"与"可解释性"的高阶需求。 以一个日常任 ...