量子位
Search documents
NeurIPS 2025放榜:阿里Qwen门控注意力获最佳论文,何恺明Faster R-CNN获时间检验奖
量子位· 2025-11-27 03:00
Core Insights - NeurIPS 2025 awarded Best Paper to four papers, with three authored by Chinese researchers, including the award-winning paper on Gated Attention by Alibaba's Qwen team [1][2][6] Group 1: Best Papers - The four Best Papers focus on breakthroughs in diffusion model theory, self-supervised reinforcement learning, large language model attention mechanisms, reasoning capabilities, online learning theory, neural scaling laws, and diversity benchmarking methods for language models [2] - The first paper, "Artificial Hivemind," addresses the issue of diversity in large language models, revealing significant internal repetition and homogeneity across models, with over 60% of responses showing similarity above 0.8 [7][8][16] - The second paper, "Gated Attention for Large Language Models," explores the effectiveness of gated attention mechanisms, demonstrating improved model performance and training stability through specific gating strategies [17][20][24] Group 2: Time-Tested Award - The Time-Tested Award was given to Faster R-CNN, a deep learning model for object detection, which significantly enhances detection speed and achieves near real-time performance [3][4][48] - Faster R-CNN introduces a Region Proposal Network (RPN) that shares convolutional features across the detection network, addressing the computational bottleneck in traditional object detection methods [52] - The framework has achieved state-of-the-art detection accuracy on various datasets, including PASCAL VOC and MS COCO, and has influenced subsequent developments in computer vision [53][55] Group 3: Research Findings - The paper on self-supervised reinforcement learning demonstrates that increasing network depth can enhance performance, achieving up to a 50-fold improvement in certain environments [25][29][31] - Research on diffusion models identifies critical training time scales for generalization and memorization, revealing that stopping training within a specific window can prevent overfitting [40][44] - The findings suggest that depth expansion is more computationally efficient than width expansion, and that the joint depth expansion of actor and critic networks can complement performance improvements [34][36]
没有身体就没有AGI!Hillbot苏昊对谈千寻高阳:具身智能泡沫很大但进展真实
量子位· 2025-11-27 03:00
Core Viewpoints - The discussion emphasizes that embodied intelligence is essential for achieving general artificial intelligence (AGI) [2][19] - The path to AGI requires physical interaction with the environment, which is facilitated by embodied intelligence [21][23] Group 1: Insights from Experts - Su Hao asserts that without embodied intelligence, there can be no general physical intelligence or general intelligence [2][16] - Gao Yang highlights that scaling data is crucial for solving problems in embodied intelligence, indicating that the essence of the challenge remains unchanged [3][10] - Both experts agree that embodied intelligence is a key entry point for understanding AGI [3][4] Group 2: Challenges and Opportunities - The conversation addresses the technical bottlenecks in the evolution of embodied intelligence and the structural advantages China has in this field [7][24] - The experts discuss the importance of real-world data for training models, with China having a significant advantage in data iteration efficiency compared to the U.S. [27][28] - They note that the integration of hardware and software design is critical for the success of embodied intelligence [26][30] Group 3: Future Predictions - Predictions indicate that the next significant breakthrough in embodied intelligence may occur within the next 2-3 years, particularly in the development of embodied models akin to GPT-3.5 [41][39] - The experts believe that achieving AGI will be a continuous process involving multiple breakthroughs rather than a single event [38][40] - The discussion concludes that the current state of embodied intelligence is characterized by both significant progress and notable hype [31][32]
爆发力超越波士顿动力液压机器人,PHYBOT M1实现全球首次全尺寸重型电驱人形机器人完美拟人态后空翻
量子位· 2025-11-26 09:33
Core Viewpoint - The article discusses the groundbreaking capabilities of PHYBOT M1, a full-sized humanoid robot developed by Dongyi Technology, which has achieved a perfect backflip, showcasing its advanced dynamic performance and potential to surpass human capabilities in complex environments [4][5][8]. Group 1: Technological Innovations - PHYBOT M1 is the world's first full-sized heavy-duty electric-driven robot to perform a perfect backflip, demonstrating superior explosive power compared to Boston Dynamics' Atlas [5][8]. - The robot's design aims to validate and push the boundaries of core capabilities necessary for humanoid robots to operate in real-world, high-intensity production environments [7][8]. - The electric drive system of PHYBOT M1 has surpassed traditional hydraulic systems in dynamic performance, achieving a peak joint torque of over 800 N·m and a total peak power output exceeding 10,000 W [16]. Group 2: Challenges in Humanoid Robotics - The industry faces challenges in balancing dynamics, structure, and control in full-sized humanoid robots, with most high-dynamic robots being smaller in size due to lower inertia and easier center of gravity control [9][10][11]. - Achieving high motion capabilities in larger humanoid robots requires significant increases in joint peak torque, energy density, and structural durability [11][12]. Group 3: Advanced Algorithms and Training - Dongyi Technology has developed three innovative algorithms to enable PHYBOT M1 to perform complex movements like backflips, addressing the challenges posed by its high inertia and the need for precise control [17][19][20]. - The training process includes a dual-stage method that ensures the robot can perform at its limits safely, enhancing the transfer of strategies from simulation to real-world applications [21][22]. Group 4: Future Implications - The successful execution of the backflip is seen as a significant milestone in the evolution of robotics, indicating a shift from basic locomotion to advanced dynamic movements [23]. - Dongyi Technology aims to leverage the capabilities of PHYBOT M1 to transform humanoid robots from mere demonstrations of technology into reliable productivity tools across various industries [24].
观众抢位中!锁定MEET2026,让我们畅聊AI|最新嘉宾阵容
量子位· 2025-11-26 09:33
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies are penetrating various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - The event will showcase the latest collisions between academic frontiers and commercial applications, featuring leading technological achievements from infrastructure, models, and product industries [4] - The conference will also feature the authoritative release of the annual AI rankings and the annual AI trend report [5][93] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has a notable background in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led multiple national projects and has extensive experience in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: AI Trends and Rankings - The "Artificial Intelligence Annual Rankings" initiated by Quantum Bit has become one of the most influential rankings in the AI industry, evaluating companies, products, and individuals across three dimensions [94] - The "2025 Annual AI Trend Report" will analyze ten major AI trends based on technological maturity, implementation status, and potential value, highlighting representative institutions and best cases [95] Group 4: Event Details - The MEET2026 Smart Future Conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [96] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [98]
全新稀疏注意力优化!腾讯最新超轻量视频生成模型HunyuanVideo 1.5核心技术解密
量子位· 2025-11-26 09:33
Core Insights - Tencent's HunyuanVideo 1.5 has been officially released and open-sourced, featuring a lightweight video generation model based on the Diffusion Transformer (DiT) architecture with 8.3 billion parameters, capable of generating 5-10 seconds of high-definition video [1][2]. Model Capabilities - The model supports video generation from text and images, showcasing high consistency between images and videos, and can accurately follow diverse instructions for various scenes, including camera movements and character emotions [5][7]. - It can natively generate 480p and 720p HD videos, with the option to upscale to 1080p cinematic quality using a super-resolution model, making it accessible for developers and creators to use on consumer-grade graphics cards with 14GB of memory [6]. Technical Innovations - HunyuanVideo 1.5 achieves a balance between generation quality, performance, and model size through multi-layered technical innovations, utilizing a two-stage framework [11]. - The first stage employs an 8.3B parameter DiT model for multi-task learning, while the second stage enhances visual quality through a video super-resolution model [12]. - The model features a lightweight high-performance architecture that achieves significant compression and efficiency, allowing for leading generation results with minimal parameters [12]. - An innovative sparse attention mechanism, SSTA (Selective and Sliding Tile Attention), reduces computational costs for long video sequences, improving generation efficiency by 1.87 times compared to FlashAttention3 [15][16]. Training and Optimization - The model incorporates enhanced multi-modal understanding with a large model as a text encoder, improving the accuracy of video text elements [20]. - A full-link training optimization strategy is employed, covering the entire process from pre-training to post-training, which enhances motion coherence and aesthetic quality [20]. - Reinforcement learning strategies are tailored for both image-to-video (I2V) and text-to-video (T2V) tasks to correct artifacts and improve motion quality [23][24]. Use Cases - Examples of generated videos include cinematic scenes such as a bustling Tokyo intersection and a cyberpunk-themed street corner, showcasing the model's ability to create visually appealing and contextually rich content [29][30].
开源模型叫板Nano Banana Pro!Stable Diffusion原班人马杀回来了
量子位· 2025-11-26 09:33
Core Insights - The article discusses the launch of Flux.2, a new AI image generation model from Black Forest Lab, which aims to compete with Google's Nano Banana Pro by offering similar image quality at a lower cost [1][42]. Group 1: Product Features - Flux.2 is designed to be a productivity tool, enhancing the capabilities of users in generating images [2]. - The model supports multiple reference images, allowing for complex image generation tasks, such as creating fashion editorial images with consistent characters [3]. - Flux.2 offers various versions, including Flux.2 [pro], [flex], [dev], and an upcoming [klein], each tailored for different user needs and performance requirements [16][17]. Group 2: Performance Comparison - Initial tests show that Flux.2's image generation speed is under 10 seconds for the [pro] version, with the ability to handle up to 10 reference images [17]. - While Flux.2 demonstrates significant improvements in instruction adherence and fine control, it still lags behind Nano Banana Pro in overall image quality [39][40]. - Users have reported that Flux.2 performs well in tasks like photo restoration and image editing, often producing results that are more natural compared to Nano Banana 2 [46][48]. Group 3: Market Positioning - Flux.2 is positioned as a cost-effective alternative to Google's models, providing high-quality outputs at a lower price point, which is appealing for users who typically face high costs with Nano Banana Pro [42]. - The model supports high-resolution image editing up to 4MP, catering to users looking for detailed outputs [44]. - The article highlights the historical context of Flux models, noting that Flux.1 was a benchmark in the AI image generation space before the introduction of Flux.2 [56][59].
国内最大AI“学术-产业-人才”盛会来了!20位院士+50位院长+300位专家集结北京海淀
量子位· 2025-11-26 06:37
Core Insights - The upcoming 2025 China Artificial Intelligence Conference aims to address the future of AI development and talent cultivation in the context of a rapidly evolving technological landscape [1][5][347] - The conference will feature discussions among academic elites and industry pioneers, focusing on the integration of academic research and practical applications in AI [4][5][347] Event Details - The conference is scheduled for January 29-30, 2025, in Haidian, Beijing [2] - It will gather over 20 domestic and international academicians, 50+ deans of AI colleges, and more than 300 experts and scholars from academia and industry [347] Themes and Objectives - The theme "Intelligence Initiates a New Era, Sea Creates the Future" emphasizes deep dialogue between academic frontiers and educational foundations [347] - The conference aims to explore the coupling of innovation chains, industry chains, and talent chains to inject new momentum into AI development during the 14th Five-Year Plan [347] Technical Focus Areas - Key topics include the development of secure and trustworthy AI models, embodied intelligence, and the integration of 6G and AI technologies [347] - The conference will also address the role of AI in various sectors, including healthcare, environmental sustainability, and digital infrastructure [349][350] Educational and Research Initiatives - The event will promote interdisciplinary talent cultivation and the integration of education and industry to build a self-innovative system [351] - It will feature interactive exhibitions and activities to foster collaboration between academia and industry [352] Publications and Strategic Directions - The conference will release the "Beijing Artificial Intelligence Industry White Paper (2025)" and the "Action Plan for Building a Global AI Industry Hub (2025-2027)" [352] - It will also identify the "Top Ten Issues in the AI Field for 2026," providing strategic direction for future AI research and innovation [352]
量子位编辑作者招聘
量子位· 2025-11-26 06:37
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Recruitment Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - All positions are full-time and based in Beijing, Zhongguancun [2]. Job Responsibilities - **AI Industry Direction**: Focuses on innovations in infrastructure, including chips, AI infrastructure, and cloud computing [5]. - **AI Finance Direction**: Involves tracking venture capital and financial reports in the AI sector, monitoring capital movements within the industry [6]. - **AI Product Direction**: Concentrates on the application and hardware advancements of AI [6]. Benefits of Joining - Employees will gain first-hand exposure to the latest AI technologies and products, enhancing their understanding of the AI landscape [6]. - The company promotes the use of new AI tools to improve work efficiency and creativity [6]. - Opportunities to build personal influence through writing original content and engaging with industry leaders at significant tech events [6]. - New hires will receive mentorship from senior editors to accelerate their professional growth [6]. - The company offers competitive salaries and comprehensive benefits, including social insurance, meal allowances, and performance bonuses [6]. Company Overview - As of 2025, Quantum Bit has over 2.4 million subscribers on WeChat and more than 7 million users across the internet, with an average daily readership exceeding 2 million [12]. - It is recognized as the top new media outlet in the AI and frontier technology sectors according to third-party data platforms [12].
ROCK & ROLL!阿里给智能体造了个实战演练场 | 开源
量子位· 2025-11-26 06:37
Core Insights - The article discusses the launch of ROCK, a new open-source project by Alibaba that addresses the challenge of scaling AI training in real environments [2][5]. - ROCK, in conjunction with the existing ROLL framework, creates a complete training loop for AI agents, enabling developers to deploy standardized environments for training without the need for complex setups [3][4][5]. Group 1: AI Training Environment - The current evolution of large language models (LLMs) into Agentic models requires them to interact deeply with external environments, moving beyond mere text generation to executing actions [6][7]. - A stable and efficient training environment is crucial for the scaling potential of Agentic models, as it directly impacts the performance and learning capabilities of the AI [9][10]. - The performance bottleneck in training processes often stems from the limitations of the training environment, necessitating a dual approach to develop both high-performance RL frameworks and efficient environment management systems [10]. Group 2: ROLL Framework - ROLL is built on Ray and is designed specifically for large-scale reinforcement learning, covering the entire RL optimization process from small-scale research to production environments with billions of parameters [12]. - ROLL enhances training speed through asynchronous interactions and redundancy sampling, utilizing a simplified standard interface called GEM [13][14]. - The design of ROLL allows for quick adaptation to new applications, enabling seamless integration of various tasks from simple games to complex tool interactions [15]. Group 3: ROCK's Features - ROCK aims to facilitate the scaling of AI training by allowing concurrent processing of thousands of instances, addressing the resource limitations of traditional training environments [22][24]. - It provides a unified environment resource pool, enabling rapid deployment and management of training environments, significantly reducing setup time from days to minutes [25][26]. - ROCK offers unprecedented flexibility, allowing both homogeneous and heterogeneous environments to run simultaneously within the same cluster, enhancing the generalization capabilities of agents [27][28]. Group 4: Debugging and Stability - ROCK addresses the common issue of "black box" environments by providing developers with a comprehensive debugging interface, allowing for deep interaction with multiple remote sandboxes [30][33]. - The system is designed for enterprise-level stability, featuring fault isolation and precise resource scheduling to ensure high-quality data collection and model convergence [41][44]. - Quick state management ensures that any environment failures can be rapidly reset, maintaining the continuity of the training pipeline [45]. Group 5: ModelService Integration - ROCK introduces ModelService as an intermediary that decouples the agent's business logic from the training framework, allowing for smoother collaboration between the two [50][51]. - This architecture reduces maintenance complexity and enhances cost efficiency by concentrating GPU resources on centralized inference services while running large-scale environments on lower-cost CPU instances [57]. - The design promotes compatibility and flexibility, enabling support for custom agent logic while maintaining robust training capabilities [58].
突破类脑模型性能瓶颈:校正频率偏置实现性能与能效双突破|NeurIPS 2025
量子位· 2025-11-26 06:37
Core Insights - The article discusses the limitations of Spiking Neural Networks (SNNs) and introduces a new architecture called Max-Former that addresses these limitations by enhancing high-frequency information processing [5][24]. Group 1: Performance Limitations of SNNs - SNNs have been traditionally viewed as inferior to Artificial Neural Networks (ANNs) due to their binary pulse transmission, which was believed to cause significant information loss [5][6]. - The research indicates that the real issue lies in the frequency bias of SNNs, where spiking neurons act as low-pass filters, suppressing high-frequency components and favoring low-frequency information [4][8][19]. - This frequency imbalance leads to a degradation in the feature representation capabilities of SNNs, limiting their performance [10][23]. Group 2: Introduction of Max-Former - The Max-Former architecture is designed to counteract the inherent low-frequency preference of SNNs by incorporating two lightweight "frequency-enhancing lenses" [24][28]. - The architecture includes an additional Max-Pool operation in the Patch Embedding stage to actively inject high-frequency signals at the input source [28]. - It also replaces early-stage self-attention with deep convolution (DWC), which retains local high-frequency details while being computationally efficient [28]. Group 3: Performance Metrics and Results - Max-Former achieved a Top-1 accuracy of 82.39% on ImageNet with fewer parameters compared to Spikformer, demonstrating a significant performance improvement [27]. - The architecture also reduced energy consumption by over 30% while achieving performance breakthroughs [30]. - The findings suggest that optimizing SNNs with high-pass operators can lead to improvements in both performance and energy efficiency [31]. Group 4: Broader Implications - The insights gained from the Max-Former architecture are applicable beyond Transformer models, as demonstrated by the Max-ResNet architecture, which also benefited from the addition of high-frequency operations [33]. - The research provides a new perspective on the performance bottlenecks of SNNs, suggesting that their optimization should not merely mimic successful designs from ANNs [35].