量子位
Search documents
量子位编辑作者招聘
量子位· 2025-11-27 04:34
Core Insights - The article emphasizes the ongoing AI boom and invites individuals to participate through the platform "Quantum Bit," which focuses on tracking AI advancements [1] - The platform has accumulated significant influence over 8 years, recognized for its industry resources and learning ecosystem [1] Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4] - All positions are full-time and based in Beijing, Zhongguancun [2] AI Industry Direction - Responsibilities include monitoring innovations in infrastructure, such as chips, AI infrastructure, and cloud computing [5] - The role involves producing accessible interpretations of cutting-edge research papers and technical reports from major conferences [6][7] AI Finance Direction - Focuses on venture capital and financial reports within the AI sector, tracking capital movements in the industry [6] - Candidates should be data-sensitive and interested in financial statements and strategic planning [11] AI Product Direction - Responsibilities include evaluating AI applications and hardware, tracking new product releases across various platforms [10] - Candidates should have a strong understanding of the trends in smart hardware and AI products [11] Company Growth and Reach - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across the internet, with a daily reading volume exceeding 2 million [12] - The platform is recognized as the top new media outlet in the AI and frontier technology sector according to third-party data platforms [12]
月之暗面公开强化学习训练加速方法:训练速度暴涨97%,长尾延迟狂降93%
量子位· 2025-11-27 04:34
Core Viewpoint - The article discusses the introduction of a new acceleration engine called Seer, developed by Moonlight and Tsinghua University, which significantly enhances the reinforcement learning (RL) training speed of large language models (LLMs) without altering the core training algorithms [1][8]. Summary by Sections Performance Improvement - Seer can improve the rollout efficiency of synchronous RL by 74% to 97% and reduce long-tail delays by 75% to 93% [3][23]. Technical Architecture - Seer consists of three main modules: 1. **Inference Engine Pool**: Built on DRAM/SSD, it includes multiple inference instances and a global KVCache pool for load balancing and data reuse [9]. 2. **Request Buffer**: Acts as a unified entry for all rollout requests, managing metadata and request states for precise resource scheduling [10]. 3. **Context Manager**: Maintains context views for all requests and generates scheduling decisions based on context signals [11]. Key Technologies - **Divided Rollout**: This technique breaks down responses into independent requests and segments, reducing memory fluctuations and load imbalance [12][13]. - **Context-Aware Scheduling**: Implements a "speculative request" strategy to prioritize obtaining length features for requests, thus alleviating long request delays [17]. - **Adaptive Grouped Speculative Decoding**: Utilizes similar response patterns within groups to create a dynamic reference library for generating drafts, enhancing decoding efficiency [19]. Experimental Validation - In experiments with models like Moonlight, Qwen2-VL-72B, and Kimi-K2, Seer demonstrated a throughput increase of 74% to 97% compared to the baseline system veRL, with significantly reduced long-tail delays [21][23]. - For instance, in the Moonlight task, the last 10% of requests took 3984 seconds with veRL, while Seer reduced this to 364 seconds, achieving an 85% reduction in long-tail delays [23]. Financing and Future Plans - Moonlight is reportedly nearing completion of a new funding round, potentially raising several hundred million dollars, which could elevate its valuation to $4 billion [32][33]. - The company is in discussions with investment firms, including IDG Capital and existing shareholder Tencent, with plans to complete the funding by the end of the year and initiate an IPO process in the following year [36][37].
观众抢位中!锁定MEET2026,让我们畅聊AI|最新嘉宾阵容
量子位· 2025-11-27 04:34
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies are penetrating various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - The event will showcase the latest collisions between academic frontiers and commercial applications, featuring leading technological achievements from infrastructure, models, and product industries [4] - The conference will also feature the authoritative release of the annual AI rankings and the annual AI trend report [5][93] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has a notable background in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led numerous national projects and has extensive experience in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: AI Trends and Rankings - The 2025 AI Annual Rankings, initiated by Quantum Bit, will evaluate companies, products, and individuals across three dimensions, becoming one of the most influential rankings in the AI industry [94] - The 2025 Annual AI Trend Report will analyze ten significant AI trends based on technological maturity, current implementation, and potential value, highlighting representative organizations and best cases [95] Group 4: Event Details - The MEET2026 Smart Future Conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [96] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [98]
视频大模型新基元:用Object Tokens重塑细节感知与指代理解
量子位· 2025-11-27 04:34
Core Insights - The article discusses the introduction of VideoOrion, a video understanding framework developed by a team from Peking University and UCSD, which received a high score of 554 at ICCV 2025. This framework aims to address the complexities of video information compared to images by utilizing Object Tokens and Context Tokens for improved semantic understanding [1][2][3]. Group 1: Framework Overview - VideoOrion encodes significant spatiotemporal dynamics in the foreground as Object Tokens, which are processed in parallel with Context Tokens, creating an efficient and interpretable video understanding framework [3][4]. - The framework explicitly extracts Object Dynamics into discrete tokens, reducing data volume and enhancing the alignment of language models (LLMs) [4][6]. - The core methodology involves a dual-branch encoding system and a "detect-segment-track" pipeline to create Object Tokens, allowing for detailed semantic integration during inference [6][10]. Group 2: Performance and Results - VideoOrion outperforms existing models such as VideoLLaMA2/2.1 across multiple benchmarks, showing improvements of +10.1% to +15.6% in various tasks [15][16]. - In specific metrics, VideoOrion achieved scores of 63.5, 65.1, 65.2, 54.6-55.3, and 57.7-3.7 in MVBench, EgoSchema, Perception-Test, VideoMME, and ActivityNet-QA, respectively, demonstrating a clear advantage over other models [16][17]. - The framework also supports video referential capabilities, allowing for precise object identification in response to queries [16][18]. Group 3: Experimental Analysis - The research indicates that the presence of an object branch significantly enhances performance across benchmarks compared to models without it [19][20]. - Pre-training the object branch is crucial for overall model effectiveness, suggesting that Object Tokens require foundational semantic learning before alignment with text [20]. - The optimal number of Object Tokens is identified as around 64, balancing information density and attention distribution [21]. Group 4: Limitations and Future Directions - The study acknowledges limitations, including the introduction of latency due to specialized visual models and the need for further optimization to enhance robustness and reduce pipeline costs [30]. - Future research will focus on improving the alignment and integration strategies between object and scene perspectives, which is essential for advancing video question answering, retrieval, and multi-modal applications [26][30].
NeurIPS 2025放榜:阿里Qwen门控注意力获最佳论文,何恺明Faster R-CNN获时间检验奖
量子位· 2025-11-27 03:00
Core Insights - NeurIPS 2025 awarded Best Paper to four papers, with three authored by Chinese researchers, including the award-winning paper on Gated Attention by Alibaba's Qwen team [1][2][6] Group 1: Best Papers - The four Best Papers focus on breakthroughs in diffusion model theory, self-supervised reinforcement learning, large language model attention mechanisms, reasoning capabilities, online learning theory, neural scaling laws, and diversity benchmarking methods for language models [2] - The first paper, "Artificial Hivemind," addresses the issue of diversity in large language models, revealing significant internal repetition and homogeneity across models, with over 60% of responses showing similarity above 0.8 [7][8][16] - The second paper, "Gated Attention for Large Language Models," explores the effectiveness of gated attention mechanisms, demonstrating improved model performance and training stability through specific gating strategies [17][20][24] Group 2: Time-Tested Award - The Time-Tested Award was given to Faster R-CNN, a deep learning model for object detection, which significantly enhances detection speed and achieves near real-time performance [3][4][48] - Faster R-CNN introduces a Region Proposal Network (RPN) that shares convolutional features across the detection network, addressing the computational bottleneck in traditional object detection methods [52] - The framework has achieved state-of-the-art detection accuracy on various datasets, including PASCAL VOC and MS COCO, and has influenced subsequent developments in computer vision [53][55] Group 3: Research Findings - The paper on self-supervised reinforcement learning demonstrates that increasing network depth can enhance performance, achieving up to a 50-fold improvement in certain environments [25][29][31] - Research on diffusion models identifies critical training time scales for generalization and memorization, revealing that stopping training within a specific window can prevent overfitting [40][44] - The findings suggest that depth expansion is more computationally efficient than width expansion, and that the joint depth expansion of actor and critic networks can complement performance improvements [34][36]
没有身体就没有AGI!Hillbot苏昊对谈千寻高阳:具身智能泡沫很大但进展真实
量子位· 2025-11-27 03:00
Core Viewpoints - The discussion emphasizes that embodied intelligence is essential for achieving general artificial intelligence (AGI) [2][19] - The path to AGI requires physical interaction with the environment, which is facilitated by embodied intelligence [21][23] Group 1: Insights from Experts - Su Hao asserts that without embodied intelligence, there can be no general physical intelligence or general intelligence [2][16] - Gao Yang highlights that scaling data is crucial for solving problems in embodied intelligence, indicating that the essence of the challenge remains unchanged [3][10] - Both experts agree that embodied intelligence is a key entry point for understanding AGI [3][4] Group 2: Challenges and Opportunities - The conversation addresses the technical bottlenecks in the evolution of embodied intelligence and the structural advantages China has in this field [7][24] - The experts discuss the importance of real-world data for training models, with China having a significant advantage in data iteration efficiency compared to the U.S. [27][28] - They note that the integration of hardware and software design is critical for the success of embodied intelligence [26][30] Group 3: Future Predictions - Predictions indicate that the next significant breakthrough in embodied intelligence may occur within the next 2-3 years, particularly in the development of embodied models akin to GPT-3.5 [41][39] - The experts believe that achieving AGI will be a continuous process involving multiple breakthroughs rather than a single event [38][40] - The discussion concludes that the current state of embodied intelligence is characterized by both significant progress and notable hype [31][32]
爆发力超越波士顿动力液压机器人,PHYBOT M1实现全球首次全尺寸重型电驱人形机器人完美拟人态后空翻
量子位· 2025-11-26 09:33
Core Viewpoint - The article discusses the groundbreaking capabilities of PHYBOT M1, a full-sized humanoid robot developed by Dongyi Technology, which has achieved a perfect backflip, showcasing its advanced dynamic performance and potential to surpass human capabilities in complex environments [4][5][8]. Group 1: Technological Innovations - PHYBOT M1 is the world's first full-sized heavy-duty electric-driven robot to perform a perfect backflip, demonstrating superior explosive power compared to Boston Dynamics' Atlas [5][8]. - The robot's design aims to validate and push the boundaries of core capabilities necessary for humanoid robots to operate in real-world, high-intensity production environments [7][8]. - The electric drive system of PHYBOT M1 has surpassed traditional hydraulic systems in dynamic performance, achieving a peak joint torque of over 800 N·m and a total peak power output exceeding 10,000 W [16]. Group 2: Challenges in Humanoid Robotics - The industry faces challenges in balancing dynamics, structure, and control in full-sized humanoid robots, with most high-dynamic robots being smaller in size due to lower inertia and easier center of gravity control [9][10][11]. - Achieving high motion capabilities in larger humanoid robots requires significant increases in joint peak torque, energy density, and structural durability [11][12]. Group 3: Advanced Algorithms and Training - Dongyi Technology has developed three innovative algorithms to enable PHYBOT M1 to perform complex movements like backflips, addressing the challenges posed by its high inertia and the need for precise control [17][19][20]. - The training process includes a dual-stage method that ensures the robot can perform at its limits safely, enhancing the transfer of strategies from simulation to real-world applications [21][22]. Group 4: Future Implications - The successful execution of the backflip is seen as a significant milestone in the evolution of robotics, indicating a shift from basic locomotion to advanced dynamic movements [23]. - Dongyi Technology aims to leverage the capabilities of PHYBOT M1 to transform humanoid robots from mere demonstrations of technology into reliable productivity tools across various industries [24].
观众抢位中!锁定MEET2026,让我们畅聊AI|最新嘉宾阵容
量子位· 2025-11-26 09:33
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies are penetrating various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - The event will showcase the latest collisions between academic frontiers and commercial applications, featuring leading technological achievements from infrastructure, models, and product industries [4] - The conference will also feature the authoritative release of the annual AI rankings and the annual AI trend report [5][93] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has a notable background in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led multiple national projects and has extensive experience in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: AI Trends and Rankings - The "Artificial Intelligence Annual Rankings" initiated by Quantum Bit has become one of the most influential rankings in the AI industry, evaluating companies, products, and individuals across three dimensions [94] - The "2025 Annual AI Trend Report" will analyze ten major AI trends based on technological maturity, implementation status, and potential value, highlighting representative institutions and best cases [95] Group 4: Event Details - The MEET2026 Smart Future Conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [96] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [98]
开源模型叫板Nano Banana Pro!Stable Diffusion原班人马杀回来了
量子位· 2025-11-26 09:33
Core Insights - The article discusses the launch of Flux.2, a new AI image generation model from Black Forest Lab, which aims to compete with Google's Nano Banana Pro by offering similar image quality at a lower cost [1][42]. Group 1: Product Features - Flux.2 is designed to be a productivity tool, enhancing the capabilities of users in generating images [2]. - The model supports multiple reference images, allowing for complex image generation tasks, such as creating fashion editorial images with consistent characters [3]. - Flux.2 offers various versions, including Flux.2 [pro], [flex], [dev], and an upcoming [klein], each tailored for different user needs and performance requirements [16][17]. Group 2: Performance Comparison - Initial tests show that Flux.2's image generation speed is under 10 seconds for the [pro] version, with the ability to handle up to 10 reference images [17]. - While Flux.2 demonstrates significant improvements in instruction adherence and fine control, it still lags behind Nano Banana Pro in overall image quality [39][40]. - Users have reported that Flux.2 performs well in tasks like photo restoration and image editing, often producing results that are more natural compared to Nano Banana 2 [46][48]. Group 3: Market Positioning - Flux.2 is positioned as a cost-effective alternative to Google's models, providing high-quality outputs at a lower price point, which is appealing for users who typically face high costs with Nano Banana Pro [42]. - The model supports high-resolution image editing up to 4MP, catering to users looking for detailed outputs [44]. - The article highlights the historical context of Flux models, noting that Flux.1 was a benchmark in the AI image generation space before the introduction of Flux.2 [56][59].
全新稀疏注意力优化!腾讯最新超轻量视频生成模型HunyuanVideo 1.5核心技术解密
量子位· 2025-11-26 09:33
Core Insights - Tencent's HunyuanVideo 1.5 has been officially released and open-sourced, featuring a lightweight video generation model based on the Diffusion Transformer (DiT) architecture with 8.3 billion parameters, capable of generating 5-10 seconds of high-definition video [1][2]. Model Capabilities - The model supports video generation from text and images, showcasing high consistency between images and videos, and can accurately follow diverse instructions for various scenes, including camera movements and character emotions [5][7]. - It can natively generate 480p and 720p HD videos, with the option to upscale to 1080p cinematic quality using a super-resolution model, making it accessible for developers and creators to use on consumer-grade graphics cards with 14GB of memory [6]. Technical Innovations - HunyuanVideo 1.5 achieves a balance between generation quality, performance, and model size through multi-layered technical innovations, utilizing a two-stage framework [11]. - The first stage employs an 8.3B parameter DiT model for multi-task learning, while the second stage enhances visual quality through a video super-resolution model [12]. - The model features a lightweight high-performance architecture that achieves significant compression and efficiency, allowing for leading generation results with minimal parameters [12]. - An innovative sparse attention mechanism, SSTA (Selective and Sliding Tile Attention), reduces computational costs for long video sequences, improving generation efficiency by 1.87 times compared to FlashAttention3 [15][16]. Training and Optimization - The model incorporates enhanced multi-modal understanding with a large model as a text encoder, improving the accuracy of video text elements [20]. - A full-link training optimization strategy is employed, covering the entire process from pre-training to post-training, which enhances motion coherence and aesthetic quality [20]. - Reinforcement learning strategies are tailored for both image-to-video (I2V) and text-to-video (T2V) tasks to correct artifacts and improve motion quality [23][24]. Use Cases - Examples of generated videos include cinematic scenes such as a bustling Tokyo intersection and a cyberpunk-themed street corner, showcasing the model's ability to create visually appealing and contextually rich content [29][30].