量子位
Search documents
Meta亚历山大王走人?小扎回应了
量子位· 2026-03-10 04:05
Core Viewpoint - Recent rumors about Alexandr Wang leaving Meta have been denied by both Meta's spokesperson and Mark Zuckerberg, indicating that Wang remains in charge of the Meta Superintelligence Labs and TBD Lab, with his influence reportedly increasing within the team [4][8]. Group 1: Rumors and Clarifications - There were widespread rumors about Wang's departure, suggesting that Zuckerberg had lost confidence in him due to delays in the new model "Avocado," which has been pushed back to Q1 2026 [9][12]. - Meta's spokesperson Andy Stone labeled the media reports about Wang's departure as "absurd" [7][34]. - Despite the denial, speculation continues regarding internal conflicts and restructuring within Meta's AI department [20][21]. Group 2: Internal Dynamics and Restructuring - Meta recently announced the formation of a new AI engineering organization led by Maher Saba, which operates independently of Wang, indicating a shift in control over AI development [22][23]. - This new team aims to create a "data engine" to enhance the training and quality of Meta's AI models, with resources being redirected away from Wang's oversight [25][26]. - Reports suggest that Wang has faced conflicts with key executives over the direction of AI development, with a preference for leveraging user data from Facebook and Instagram rather than pursuing cutting-edge models [30][31]. Group 3: Background on Alexandr Wang - Alexandr Wang, co-founder and former CEO of Scale AI, joined Meta in 2025 after Meta invested approximately $14.3 billion in Scale AI, making him a prominent figure in Meta's AI strategy [45][50]. - At just 29 years old, Wang has been recognized as a significant player in the tech industry, having made substantial changes to Meta's AI operations since his arrival [46][52]. - Under Wang's leadership, Meta shifted its focus from iterative improvements on open-source models to developing new foundational models intended to compete directly with OpenAI and Google [55].
机器人全程自主收拾客厅!390亿美元估值机器人端到端新技能,英伟达持续加注
量子位· 2026-03-10 04:05
Core Insights - Figure, a robotics company, has reached a valuation of $39 billion, with continued support from Nvidia [28][34] - The company has developed the Helix 02 system, which enables robots to autonomously perform complex tasks in household settings [9][11] Group 1: Technological Advancements - The Helix 02 system allows robots to execute a full range of tasks, including cleaning and organizing living spaces, without human intervention [11][25] - The system utilizes a unified visual-motor neural network, enabling seamless integration of sensory input and motor output [13][14] - Key components of Helix 02 include System 0 for full-body control, System 1 for perception to action translation, and System 2 for semantic reasoning [17][24] Group 2: Operational Capabilities - The robot can perform 61 different operations autonomously, such as cleaning surfaces, organizing items, and interacting with various objects [11][26] - It demonstrates advanced capabilities like dynamic manipulation of flexible items and precise control of movements in confined spaces [26][27] - The system's ability to learn new tasks simply by adding new data highlights its adaptability and efficiency [25] Group 3: Company Background and Funding - Figure was founded in May 2022 by Bret Adcock, a serial entrepreneur with a background in electric aircraft [28] - The company initially partnered with OpenAI for AI model integration but later parted ways due to integration challenges [33] - Despite the split, Figure secured over $1 billion in Series C funding, significantly boosting its valuation and attracting investments from major tech firms [34]
12小时登顶OpenAI MLE-bench!上海AI Lab开源算法进化框架MLEvolve
量子位· 2026-03-10 04:05
MLEvolve团队 投稿 量子位 | 公众号 QbitAI 让AI像Kaggle顶尖选手一样设计算法,需要几步? 上海AI实验室「书生」科学发现平台最新开源的 MLEvolve 给出了答案: 12小时登顶MLE-bench榜单第一。 MLEvolve的核心是一套自进化机器学习系统。它用 渐进式蒙特卡洛图搜索 替代传统树搜索,让不同尝试路径之间实现经验互通;用 全局 记忆层 记录每一次成败,让智能体越探索越聪明;还用多模式代码生成和多智能体协作,覆盖从方案设计到代码审查的全流程。 算法发现也是创新能力的一种重要体现。 能够自主设计和优化算法,意味着不仅掌握了现有工具的使用,更具备了创造新工具的能力。在 AI时代,让智能系统具备算法层面的创新能力,是迈向自主科学发现的关键一步。 『书生』科学发现平台 (Intern Discovery) 是上海人工智能实验室面向AI驱动科学发现构建的综合性平台。作为支撑平台的核心技术之 一,InternAgent 1.5构建了生成、验证、进化三大协同子系统,将科研抽象为可不断迭代的智能推理过程。 MLEvolve 作为其中验证子系统的方案优化引擎,聚焦于算法设计与优化任务——在 ...
Jeff Dean最新访谈:未来开发者人均50个智能体,写需求成核心技能
量子位· 2026-03-10 02:13
Core Insights - Google's Chief AI Scientist Jeff Dean predicts that in the future, each engineer may manage 50 AI agents, completing numerous parallel tasks with higher communication efficiency than humans [1] - The most important skill in the future will be "writing clear requirements," as the output quality of AI agents depends entirely on how well problems are defined [2][3] Group 1: AI Model Development - Google follows a Pareto frontier strategy, focusing on both high-end models for complex tasks and cost-effective models for low-latency scenarios [3][19] - The Gemini 3 Flash model achieves speed and intelligence through a process called distillation, allowing smaller models to closely match the performance of larger models [5][6][8] - Distillation enables small models to learn from large models' outputs, resulting in refined behaviors and capabilities [7][24][25] Group 2: Low Latency and Multi-Modal Models - Jeff Dean emphasizes the value of low latency, believing that reducing latency by 20-50 times will significantly enhance user experience [9][153] - The Gemini model is designed to be multi-modal, understanding not just human-perceived modalities like text and images, but also "non-human" modalities such as LIDAR and medical imaging data [39][44][46] Group 3: Future of AI and Engineering - The future will require engineers to spend more time on design and specifications, as clear communication will be crucial for effective AI collaboration [144][150] - The ability to express requirements clearly will become a core skill, impacting not just software engineering but any complex task [145][146] - Dean predicts that truly personalized models will be extremely important, capable of understanding individual user contexts and histories [156] Group 4: Hardware and Efficiency - The collaboration between hardware design and machine learning is essential for optimizing performance and efficiency [80][84] - Future advancements in specialized hardware will lead to significant reductions in model latency and improvements in capabilities, transforming various application scenarios [158]
10秒视频token超5万,O(n²)跑不动?用后训练线性化框架实现1.71倍加速,推理成本大降|CVPR'2026
量子位· 2026-03-10 02:13
Core Viewpoint - The article discusses the challenges and advancements in video diffusion models, particularly focusing on the introduction of LINVIDEO, a framework that enables significant linearization of video generation processes without the need for data or retraining, while maintaining quality [3][25]. Group 1: Challenges in Video Diffusion Models - Video generation has entered a large-scale era, but the computational costs have surged significantly [1]. - The complexity of self-attention in video generation is O(n²), making it difficult to run efficiently, especially with token counts exceeding 50,000 for a 10-second video [2]. - The difficulty in linearizing video diffusion models arises from the sensitivity of the replacement process, where different layers have varying impacts on the final generation quality [7]. Group 2: LINVIDEO Framework - LINVIDEO is a post-training framework that allows for high-proportion linearization of video diffusion models while preserving generation quality [3][6]. - The framework employs a selective transfer approach, treating layer selection as a binary decision problem, allowing the model to learn which layers can be safely linearized [15][25]. - Additionally, LINVIDEO introduces anytime distribution matching (ADM), which aligns sample distributions across any timestep, enhancing efficiency and effectiveness without the need for auxiliary models [15][25]. Group 3: Experimental Results - LINVIDEO demonstrated a 1.71× end-to-end acceleration on the Wan 14B model, and with a 4-step distillation, it achieved up to 20.9× acceleration while maintaining nearly the same video quality [6][19]. - The performance comparison with other methods showed that LINVIDEO achieved a latency of 68.26 seconds with a 1.43× speedup on the Wan 1.3B model, and 1127 seconds with a 1.71× speedup on the Wan 14B model [17][19]. - Overall, LINVIDEO provides a practical solution for the linearization of video diffusion models, moving from O(n²) to a more scalable O(n) inference path [25].
只要1分钟!电脑装满血龙虾,现在跟下载APP似的
量子位· 2026-03-10 02:13
Core Viewpoint - The article discusses the launch of a new application called AutoClaw by Zhipu, which simplifies the deployment of AI-driven tasks, making it accessible for users without technical expertise [5][29]. Group 1: Application Features - AutoClaw allows users to execute tasks with a simple prompt, such as summarizing the latest news related to OpenClaw from various platforms [2][4]. - The application can be installed and set up in just one minute, eliminating the need for complex configurations [5][29]. - It supports multiple models, including Pony-Alpha-2, which is specifically designed for OpenClaw tasks [7][33]. Group 2: User Experience - Users can easily integrate AutoClaw with communication tools like Feishu, allowing for automated task management and information retrieval [14][35]. - The application comes preloaded with over 50 skills, enabling users to automate various tasks without needing to learn how to use the software [8][35]. - AutoClaw can perform complex multi-step tasks and browser operations, enhancing its usability for everyday users [34][33]. Group 3: Market Implications - The launch of AutoClaw represents a significant shift in AI interaction paradigms, moving from complex setups to user-friendly applications [25][30]. - By lowering the barriers to entry for AI tools, Zhipu is positioning itself to capture a broader market of non-technical users [28][30]. - The application signifies a technological advancement that makes powerful AI tools accessible to the general public, potentially transforming how individuals and businesses operate [38].
首个物理AI数据基座平台“无垠”落户浙江,专治机器人数据荒,家庭工业商业场景全覆盖
量子位· 2026-03-09 10:05
Core Viewpoint - The article discusses the rise of embodied intelligence and world models in the physical AI sector, which have collectively attracted over 30 billion yuan in investment this year. These two components represent the practical application and training ground for physical AI, respectively. However, the industry faces challenges in bridging the Sim2Real gap, which complicates the reliance on real data due to high costs and scalability issues [1][2]. Group 1: Investment and Market Trends - In the first two months of the year, the embodied intelligence sector saw over 20 financing rounds, with a cumulative investment exceeding 20 billion yuan [3]. - The influx of capital is driving technological advancements, enabling robots to transition from stage performances to real-world applications, although many tasks still face unstructured scenarios requiring strong generalization capabilities from models [4]. Group 2: Data Challenges and Solutions - The lack of high-quality data is identified as a significant pain point for embodied intelligence, as it requires multimodal data with physical feedback and interaction, which is currently in short supply [6]. - The industry is exploring three main solutions to address data issues, with a recent shift towards a virtual-real fusion approach to provide large-scale data for physical AI [7][8]. Group 3: Launch of the Data Base Platform - The first physical AI data base platform, "Wuyin," was launched by Wenshijia, aiming to address the industry's high-quality data shortage. The platform has accumulated over 1,000 TB of data and plans to open-source 10,000 hours of high-quality data [11][15]. - The platform's core capabilities include a high-quality data system, a valuable scene ecosystem, and a Real2Sim2Real closed-loop toolchain, facilitating end-to-end processes from data collection to industrial application [11][17]. Group 4: Industry Collaboration and Ecosystem - The Wuyin platform has gained recognition for its ability to provide high-quality data and support the entire chain from training to commercialization, attracting over 50 ecosystem partners [29]. - Strategic collaborations with companies like Horizon and Digua Robot are highlighted, emphasizing the importance of data in driving physical AI advancements [31][38]. Group 5: Founder's Insights and Industry Context - Wenshijia was founded by Liu Shengxiang, a pioneer in autonomous driving data and testing validation, who recognized the critical need for high-quality data in physical AI [34][36]. - The article notes that the physical AI sector is experiencing rapid growth, with major companies entering the field, indicating a consensus that the next wave of AI will be driven by physical AI [39].
LeCun团队新论文:模仿人类智能搞AI,照猫画虎死胡同
量子位· 2026-03-09 10:05
Core Viewpoint - The pursuit of Artificial General Intelligence (AGI) may have been misguided from the start, with a new focus proposed on Superhuman Adaptable Intelligence (SAI) instead of merely mimicking human intelligence [1][2]. Group 1: Key Changes in AI Development Goals - The development goals of AI are shifting towards three key changes, emphasizing the speed of adapting to new tasks rather than achieving human-like intelligence [3][5]. - SAI aims to surpass human capabilities in tasks humans can perform and tackle areas previously unexplored by humans [5][6]. - The focus is moving from the number of skills an AI can perform to the speed at which it can learn new skills [6][12]. Group 2: Critique of Human-Centric AI Development - The traditional approach of using humans as a benchmark for intelligence is seen as problematic, as it may limit AI's potential [10][11]. - The paper argues that if the goal is merely to reach human-level performance, it could hinder AI's development [11][16]. - The authors suggest that optimizing for the speed of adapting to new tasks is more beneficial than simply imitating human capabilities [12][13]. Group 3: Understanding Human Intelligence Limitations - Human intelligence is not as "general" as often perceived; it is primarily a survival tool shaped by evolution [18][20]. - Many abilities considered "general" are actually the result of evolutionary adaptations, and humans perform poorly in tasks like complex calculations compared to computers [22][23]. - The concept of AGI may be an illusion, as it overlooks the biological limitations of human intelligence [25][30]. Group 4: Emphasis on Specialization - Specialization is presented as the norm for intelligence evolution, both in biology and AI systems [31][32]. - AI systems face pressure to optimize for specific tasks, as general models may not meet the demands of critical applications [34][40]. - The success of AI algorithms often comes from their alignment with the structure of the problems they are designed to solve [38][39]. Group 5: Proposed Technical Pathways for SAI - The authors propose three key technical pathways for achieving SAI: self-supervised learning, world models, and modular systems [43]. - Self-supervised learning allows AI to learn from real-world data without human labeling [44]. - World models enable AI to simulate environments and predict outcomes, facilitating task completion without explicit training [45][46]. - A modular architecture is favored over a single "one-size-fits-all" model, promoting collaboration among specialized systems [47][48].
扩散模型终于学会「看题下菜碟」!根据提示词难度动态分配算力,简单题省时复杂题保画质
量子位· 2026-03-09 10:05
Core Viewpoint - The article discusses the introduction of a new framework called "CoTj" by China Unicom's Data Science and AI Research Institute, which enhances diffusion models' ability to dynamically allocate computational resources based on the complexity of prompts, significantly improving image generation quality [4][35]. Group 1: Framework and Mechanism - The CoTj framework allows diffusion models to possess "System 2" planning capabilities, enabling them to allocate computational resources dynamically according to the complexity of the prompts [4][14]. - CoTj employs a "Predict-Plan-Execute" reasoning paradigm, featuring a lightweight predictor that estimates the current Diffusion DNA from condition embeddings, achieving rapid predictions [14][15]. - The framework transforms complex sampling processes into a directed acyclic graph (DAG) optimization problem, allowing for efficient trajectory planning [11][13]. Group 2: Performance and Results - In experiments, CoTj demonstrated superior image quality even with a basic first-order solver, outperforming traditional methods that used high-order solvers under the same conditions [22][24]. - The framework achieved significant improvements in accuracy and speed across various models, with notable metrics such as a 60% reduction in mean squared error (MSE) and over 6 dB increase in peak signal-to-noise ratio (PSNR) [25][28]. - CoTj's trajectory planning allows for high fidelity in image generation, even with drastically reduced sampling steps, maintaining essential details that traditional methods often lose [27][29]. Group 3: Future Directions - The research team indicates that the theoretical foundation of CoTj will be expanded to more complex video dynamics and will explore unsupervised Diffusion DNA discovery across modalities [36][37]. - The framework represents a significant leap in computational efficiency and resource-aware planning in generative AI, marking a new era for diffusion models [35][36].
可微奖励就该直接微调!用HJB方程颠覆流匹配对齐|NeurIPS'25
量子位· 2026-03-09 06:05
Core Insights - The article introduces VGG-Flow, a new approach to fine-tuning diffusion models using reinforcement learning, emphasizing the direct utilization of differentiable reward functions for more efficient and stable model alignment [1][3][26]. Group 1: Methodology - VGG-Flow team proposes a novel method that reformulates reward fine-tuning as a continuous-time optimal control problem, leveraging the Hamilton–Jacobi–Bellman (HJB) equation to convert differentiable rewards into value gradients [3][4][26]. - The core idea is to maximize the difference between terminal rewards and cumulative costs, with the flow matching model simulating trajectories from time t=0 to t=1 [4][8]. - The fine-tuned velocity field is expressed as a sum of a pre-trained model and a residual component, ensuring that the model does not deviate excessively from its original distribution during fine-tuning [5][7]. Group 2: Optimal Control Perspective - From an optimal control standpoint, the approach involves a terminal state objective combined with a path cumulative cost, leading to a value function that describes the optimal expected cost from a given state [8][9]. - The evolution of the value function adheres to the HJB equation, which provides a continuous-time formulation of the Bellman equation used in reinforcement learning [10][12]. Group 3: Parameterization and Optimization - VGG-Flow introduces a Forward-looking parameterization method to guide the value gradient during initial training phases, which includes estimating terminal points and parameterizing the value gradient with reward gradients [14][15]. - The loss function for the framework consists of gradient matching loss, terminal boundary loss, and potentially a residual term, ensuring that the model aligns with the value gradient effectively [17][19][20]. Group 4: Experimental Results - In experiments with Stable Diffusion 3, VGG-Flow achieved stable improvements in reward signals with only 400 updates, demonstrating high convergence efficiency and maintaining diversity in generated outputs [21][26]. - Compared to existing methods like ReFL and DRaFT, VGG-Flow showed more robust performance, preserving the priors from the pre-trained model and yielding more natural generation results [21][26].