Workflow
量子位
icon
Search documents
DeepSeek再破谷歌OpenAI垄断:开源IMO数学金牌大模型
量子位· 2025-11-28 01:53
Core Insights - DeepSeek has released a new mathematical model, DeepSeekMath-V2, focusing on self-verifiable mathematical reasoning [1][7] - The model has achieved gold medal-level scores in IMO 2025 and CMO 2024, and scored 118/120 in Putnam 2024, surpassing the highest human score of 90 [2][43] - DeepSeekMath-V2 is the first open-source IMO gold medal model, raising competitive pressure on companies like Google and OpenAI [4][5] Model Performance - DeepSeekMath-V2 outperforms GPT-5-Thinking-High and Gemini 2.5-Pro across all CNML problem categories, including algebra, geometry, number theory, combinatorics, and inequalities [2][34] - The model's architecture includes 685 billion parameters, emphasizing strong proof verification capabilities [7] Training Methodology - The training process involves an iterative reinforcement learning loop that alternates between optimizing the proof verifier and the proof generator [9] - A large dataset of 17,500 proof-required math problems was collected from AoPS competitions to train the proof verifier [12] - The verifier is trained to identify issues in proofs and assign scores based on three levels of correctness [10] Meta-Verification Mechanism - A meta-verification mechanism was introduced to enhance the verifier's accuracy by assessing the validity of the identified issues [14] - The meta-verifier is trained using a dataset created from expert evaluations of the verifier's output [15] Proof Generation - The trained verifier serves as a reward model for the proof generator, which learns to self-review and correct its outputs [23] - The reward structure encourages accurate self-assessment and correction of errors in generated proofs [27] Automation and Efficiency - The collaboration between the verifier and generator leads to a fully automated data labeling process, replacing time-consuming manual annotations [29][35] - The automated process ensures high consistency with expert evaluations, significantly improving efficiency [35] Experimental Results - The model's average quality score for proof analysis improved from 0.85 to 0.96, demonstrating the effectiveness of the meta-verification mechanism [21] - The model's ability to generate correct proofs was validated through rigorous testing, showing superior performance across various mathematical problem categories [34][39]
顶会双盲评审大翻车!一个Bug审稿人信息全泄露,ICLR、NeurIPS、ACL都遭殃…
量子位· 2025-11-28 01:53
Core Points - A significant bug in the OpenReview system has exposed the identities of reviewers for major computer science conferences, undermining the double-blind review process [2][4][19] - The bug was reported on November 27, 2015, and was fixed within an hour, but the damage had already been done as reviewer information was harvested [6][10][12] - The incident has sparked discussions about the integrity of the peer review process and the potential need to reassess the double-blind review system [21][25] Group 1 - The bug allowed anyone to retrieve personal information of reviewers by inputting specific fields into an API link, affecting all conferences hosted on OpenReview [4][5][8] - ICLR 2026 issued a statement condemning the misuse of leaked information and warned of severe consequences for any attempts to exploit the data [6][8][13] - The incident has led to a surge of posts from authors identifying their reviewers, raising concerns about the repercussions for the peer review community [14][19][22] Group 2 - The OpenReview team is currently analyzing API call logs to determine the extent of the data breach and identify accounts that accessed sensitive information [12] - The event has prompted calls for accountability among reviewers, with some suggesting that irresponsible reviewers should lose their anonymity [24][25] - The academic community is urged to reflect on the vulnerabilities of the current review system and the potential for reform [20][21]
第三波嘉宾来袭!等你一起MEET2026,速戳报名
量子位· 2025-11-27 09:30
Core Points - The MEET2026 Intelligent Future Conference will be held on December 10, 2025, in Beijing, focusing on AI and cutting-edge technology [1] - Over 20 industry experts have confirmed their attendance, indicating strong interest and participation from key figures in the tech sector [2] - The conference will feature significant announcements, including the release of the AI Annual List and the Annual AI Trend Report [28][29] Group 1: Conference Details - The MEET2026 conference aims to review the most noteworthy topics from the past year and anticipate future technology trends [1] - The event is expected to attract thousands of tech professionals and millions of online viewers, establishing itself as a significant annual technology summit [33] Group 2: Notable Speakers - Dennis Yue, head of Google Cloud's enterprise and startup business in Greater China, brings over 30 years of experience in cloud computing and IT services [9] - Yao Xin, co-founder and CEO of PPIO, has a strong background in AI cloud computing and has previously founded a global internet TV platform [14] - Mao Jian, COO of Yunxi Technology, specializes in digital transformation services and has over 20 years of management consulting experience [18] - Tu Jing, founder and CEO of Zhuoshijia Technology, has extensive experience in AI product design and commercialization [22] - Zhao Tiancheng, CEO and chief scientist of Lianhui Technology, is recognized for his contributions to AI research and development [27] Group 3: Awards and Reports - The AI Annual List will evaluate companies, products, and individuals across three dimensions, becoming one of the most influential lists in the AI industry [29] - The Annual AI Trend Report will identify and analyze ten significant AI trends based on technology maturity, implementation status, and potential value [30]
大模型首次拥有“脖子”!纽大团队实现360度类人视觉搜索
量子位· 2025-11-27 07:30
Core Insights - The research introduces a new task called Humanoid Visual Search (HVS), enabling models to perform 360-degree visual searches in real-world environments like train stations and shopping malls [6][10][12] - A new benchmark test, H*Bench, has been developed to evaluate the search capabilities of intelligent agents in complex environments, moving beyond traditional simple household scenarios [7][8][9] - The study aims to transition visual spatial reasoning from a "disembodied passive paradigm" to an "embodied active paradigm," enhancing the model's ability to integrate physical actions with visual reasoning [9][12] Group 1: Humanoid Visual Search - HVS allows intelligent agents to autonomously rotate their heads to search for target objects or paths in immersive environments [6][12] - The task focuses on two main search problems: Humanoid Object Search (HOS) and Humanoid Path Search (HPS), each with varying levels of difficulty based on visibility and environmental cues [12][16] - HOS involves locating and focusing on target objects, while HPS requires identifying navigable paths and adjusting body orientation [16][12] Group 2: Benchmark and Dataset - The H* dataset consists of approximately 3,000 labeled task instances derived from diverse high-resolution panoramic videos, providing a comprehensive geographical coverage [21][22] - The benchmark includes scenes from six main categories: retail environments, transportation hubs, urban streets, public institutions, offices, and entertainment venues [24] - The dataset allows for 12,000 search rounds by initializing agents from four different starting directions [22] Group 3: Model Training and Performance - The research utilizes a multi-modal reasoning task, employing a strategy network to integrate tool usage and head rotation, enhancing the model's decision-making capabilities [17][28] - Training results show significant improvements in search accuracy for target search (from 14.83% to 47.38%) and path search (from 6.44% to 24.94%) after model training [28] - The study highlights that larger model sizes do not necessarily guarantee better performance, with smaller models outperforming larger counterparts in certain tasks [33][34] Group 4: Challenges and Insights - The research identifies fundamental bottlenecks in advanced reasoning that require physical, spatial, and social common sense, despite improvements in low-level perception and motion capabilities [34][36] - Errors in HOS primarily stem from insufficient perception in cluttered environments, while HPS errors are more complex, involving a lack of physical and social common sense [36] - The study emphasizes that active visual search (rotating in panoramic views) is more intuitive and effective than passive analysis of static images [36]
5亿热钱砸向清华AI Infra明星:最大化算力效能筑造智能体基建
量子位· 2025-11-27 07:30
Core Insights - Wunwen Xinqiong has raised nearly 500 million yuan in its A+ round financing, bringing its total funding to approximately 1.5 billion yuan in just two and a half years, establishing itself as a leading player in the AI infrastructure sector [1][7][26] - The company, founded by Tsinghua University alumni Wang Yu and Xia Lixue, has differentiated itself through its full-stack AI system optimization capabilities and hardware-software synergy, amidst intense competition in the AI infrastructure space [1][8][26] Financing Details - The recent financing round was led by Zhuhai Technology Group and Futen Capital, with participation from various existing investors, indicating strong confidence in the company's direction [5][7] - This financing showcases a dual endorsement from state-owned enterprises and market-oriented funds, affirming Wunwen Xinqiong's technological innovations aligned with national priorities [7][8] Strategic Focus - The funds from this round will be allocated to three main areas: expanding technological advantages, promoting AI cloud products and terminal solutions, and increasing R&D investment in intelligent infrastructure [8][9] - Wunwen Xinqiong aims to build a first-class intelligent service platform and supporting cloud and terminal infrastructure, positioning itself at the forefront of the intelligent agent era [9][10] Technological Advancements - The company has completed a transformation towards native infrastructure for intelligent agent scenarios, proposing a systematic strategy of "Into Agent, With Agent, For Agent" [10][11] - Wunwen Xinqiong has established significant engineering capabilities, managing over 25,000 P+ computing power across 26 cities and 53 core data centers, serving numerous leading clients and research institutions [12][20] Product Development - The company has launched two new Agentic AI products and several core underlying technologies, including the Infini-Megrez model and the Infini-Mizar inference acceleration engine, which significantly enhance performance and efficiency [16][17] - These innovations form a comprehensive "intelligent agent infrastructure trio," addressing critical gaps in the deployment of intelligent agents [17][20] Market Positioning - Wunwen Xinqiong is recognized for its systematic capabilities across cloud computing, terminal model inference, and intelligent system frameworks, making it one of the few companies excelling in these areas [20][26] - The company is strategically positioned to support the integration of intelligent agents into both digital and physical worlds, aligning with national goals for AI development [27][28]
量子位编辑作者招聘
量子位· 2025-11-27 04:34
Core Insights - The article emphasizes the ongoing AI boom and invites individuals to participate through the platform "Quantum Bit," which focuses on tracking AI advancements [1] - The platform has accumulated significant influence over 8 years, recognized for its industry resources and learning ecosystem [1] Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4] - All positions are full-time and based in Beijing, Zhongguancun [2] AI Industry Direction - Responsibilities include monitoring innovations in infrastructure, such as chips, AI infrastructure, and cloud computing [5] - The role involves producing accessible interpretations of cutting-edge research papers and technical reports from major conferences [6][7] AI Finance Direction - Focuses on venture capital and financial reports within the AI sector, tracking capital movements in the industry [6] - Candidates should be data-sensitive and interested in financial statements and strategic planning [11] AI Product Direction - Responsibilities include evaluating AI applications and hardware, tracking new product releases across various platforms [10] - Candidates should have a strong understanding of the trends in smart hardware and AI products [11] Company Growth and Reach - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across the internet, with a daily reading volume exceeding 2 million [12] - The platform is recognized as the top new media outlet in the AI and frontier technology sector according to third-party data platforms [12]
月之暗面公开强化学习训练加速方法:训练速度暴涨97%,长尾延迟狂降93%
量子位· 2025-11-27 04:34
Core Viewpoint - The article discusses the introduction of a new acceleration engine called Seer, developed by Moonlight and Tsinghua University, which significantly enhances the reinforcement learning (RL) training speed of large language models (LLMs) without altering the core training algorithms [1][8]. Summary by Sections Performance Improvement - Seer can improve the rollout efficiency of synchronous RL by 74% to 97% and reduce long-tail delays by 75% to 93% [3][23]. Technical Architecture - Seer consists of three main modules: 1. **Inference Engine Pool**: Built on DRAM/SSD, it includes multiple inference instances and a global KVCache pool for load balancing and data reuse [9]. 2. **Request Buffer**: Acts as a unified entry for all rollout requests, managing metadata and request states for precise resource scheduling [10]. 3. **Context Manager**: Maintains context views for all requests and generates scheduling decisions based on context signals [11]. Key Technologies - **Divided Rollout**: This technique breaks down responses into independent requests and segments, reducing memory fluctuations and load imbalance [12][13]. - **Context-Aware Scheduling**: Implements a "speculative request" strategy to prioritize obtaining length features for requests, thus alleviating long request delays [17]. - **Adaptive Grouped Speculative Decoding**: Utilizes similar response patterns within groups to create a dynamic reference library for generating drafts, enhancing decoding efficiency [19]. Experimental Validation - In experiments with models like Moonlight, Qwen2-VL-72B, and Kimi-K2, Seer demonstrated a throughput increase of 74% to 97% compared to the baseline system veRL, with significantly reduced long-tail delays [21][23]. - For instance, in the Moonlight task, the last 10% of requests took 3984 seconds with veRL, while Seer reduced this to 364 seconds, achieving an 85% reduction in long-tail delays [23]. Financing and Future Plans - Moonlight is reportedly nearing completion of a new funding round, potentially raising several hundred million dollars, which could elevate its valuation to $4 billion [32][33]. - The company is in discussions with investment firms, including IDG Capital and existing shareholder Tencent, with plans to complete the funding by the end of the year and initiate an IPO process in the following year [36][37].
10000个代码文件,我打几把游戏的功夫就搞成Wiki了!
量子位· 2025-11-27 04:34
Core Viewpoint - The article discusses the capabilities of Qoder, a domestic AI programming tool that significantly enhances the efficiency of understanding and managing large codebases, particularly in real software development scenarios [10][11][12]. Group 1: Qoder's Features and Capabilities - Qoder can generate comprehensive project documentation, such as a Wiki, from large codebases with minimal user intervention, allowing developers to focus on other tasks while the tool processes the information [7][8]. - The tool supports various programming languages and can handle projects with up to 10,000 code files, drastically reducing the time required for code comprehension from days to minutes [24]. - Qoder's Quest Mode allows developers to input natural language specifications, which the tool then translates into detailed task plans and executes them autonomously [25][26]. Group 2: Performance and Comparison - In practical tests, Qoder demonstrated superior understanding of code context, outperforming leading products by 13.22% in context engineering capabilities [53]. - The tool offers a high cost-performance ratio, enabling users to complete 205% of programming tasks for the same expenditure compared to competitors [54]. - Qoder's integration with JetBrains and CLI tools enhances its usability, allowing developers to utilize its features without changing their preferred development environments [30][44]. Group 3: Market Position and Future Plans - Qoder has quickly gained popularity, achieving over 30,000 downloads shortly after its JetBrains plugin launch, indicating strong demand among backend developers [34]. - The company plans to release a team version in December, further expanding its market reach and functionality [56]. - Qoder has established a durability evaluation set for AI programming tools, providing a benchmark for assessing performance in real-world scenarios, which is expected to grow in the coming months [58][60].
观众抢位中!锁定MEET2026,让我们畅聊AI|最新嘉宾阵容
量子位· 2025-11-27 04:34
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies are penetrating various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - The event will showcase the latest collisions between academic frontiers and commercial applications, featuring leading technological achievements from infrastructure, models, and product industries [4] - The conference will also feature the authoritative release of the annual AI rankings and the annual AI trend report [5][93] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has a notable background in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led numerous national projects and has extensive experience in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: AI Trends and Rankings - The 2025 AI Annual Rankings, initiated by Quantum Bit, will evaluate companies, products, and individuals across three dimensions, becoming one of the most influential rankings in the AI industry [94] - The 2025 Annual AI Trend Report will analyze ten significant AI trends based on technological maturity, current implementation, and potential value, highlighting representative organizations and best cases [95] Group 4: Event Details - The MEET2026 Smart Future Conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [96] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [98]
视频大模型新基元:用Object Tokens重塑细节感知与指代理解
量子位· 2025-11-27 04:34
Core Insights - The article discusses the introduction of VideoOrion, a video understanding framework developed by a team from Peking University and UCSD, which received a high score of 554 at ICCV 2025. This framework aims to address the complexities of video information compared to images by utilizing Object Tokens and Context Tokens for improved semantic understanding [1][2][3]. Group 1: Framework Overview - VideoOrion encodes significant spatiotemporal dynamics in the foreground as Object Tokens, which are processed in parallel with Context Tokens, creating an efficient and interpretable video understanding framework [3][4]. - The framework explicitly extracts Object Dynamics into discrete tokens, reducing data volume and enhancing the alignment of language models (LLMs) [4][6]. - The core methodology involves a dual-branch encoding system and a "detect-segment-track" pipeline to create Object Tokens, allowing for detailed semantic integration during inference [6][10]. Group 2: Performance and Results - VideoOrion outperforms existing models such as VideoLLaMA2/2.1 across multiple benchmarks, showing improvements of +10.1% to +15.6% in various tasks [15][16]. - In specific metrics, VideoOrion achieved scores of 63.5, 65.1, 65.2, 54.6-55.3, and 57.7-3.7 in MVBench, EgoSchema, Perception-Test, VideoMME, and ActivityNet-QA, respectively, demonstrating a clear advantage over other models [16][17]. - The framework also supports video referential capabilities, allowing for precise object identification in response to queries [16][18]. Group 3: Experimental Analysis - The research indicates that the presence of an object branch significantly enhances performance across benchmarks compared to models without it [19][20]. - Pre-training the object branch is crucial for overall model effectiveness, suggesting that Object Tokens require foundational semantic learning before alignment with text [20]. - The optimal number of Object Tokens is identified as around 64, balancing information density and attention distribution [21]. Group 4: Limitations and Future Directions - The study acknowledges limitations, including the introduction of latency due to specialized visual models and the need for further optimization to enhance robustness and reduce pipeline costs [30]. - Future research will focus on improving the alignment and integration strategies between object and scene perspectives, which is essential for advancing video question answering, retrieval, and multi-modal applications [26][30].