量子位
Search documents
结构化预处理让DeepSeek准确率提升51%,现已开源丨清华&深言
量子位· 2026-01-05 05:00
Core Insights - The article introduces LingoEDU, a new method that enhances the accuracy of DeepSeek by 51% through a structured approach to information processing [1][7][46] - LingoEDU focuses on creating a clear semantic structure, allowing for precise tracking of information back to its original source, thereby addressing the issue of "hallucination" in AI-generated content [5][44] Group 1: Methodology and Implementation - LingoEDU employs a preprocessing model that segments text into Elementary Discourse Units (EDUs), assigning unique index markers to each unit for accurate referencing [1][5][21] - The method allows for structured pre-processing of context before it enters the main model, improving the efficiency and accuracy of information generation [2][10] - By creating a semantic tree structure, LingoEDU ensures that every generated output can be traced back to its original text, thus enhancing the reliability of AI outputs [4][46] Group 2: Experimental Results - Experimental results indicate that LingoEDU significantly outperforms baseline models in terms of segmentation accuracy, cost, and efficiency [7][35] - In a comparative study, DeepSeek-R1's accuracy improved from 9.0% to 13.6% after implementing LingoEDU, marking a 51% relative increase [7][40] - The method was tested on a dataset of 248 articles, demonstrating superior performance in tree edit distance (TED) and document-level accuracy (DLA) compared to existing models [34][35] Group 3: Advantages and Value Proposition - LingoEDU retains the semantic integrity of the original text while providing a structured format that enhances information management and reduces processing costs [6][45] - The approach addresses the critical industry challenge of AI hallucination by ensuring that AI-generated content is both accurate and traceable [44][46] - LingoEDU is positioned as a transformative technology that shifts AI applications from "black box" models to more interpretable and controllable systems, setting a new standard for reliable AI [46][47]
华为开源7B多模态模型,视觉定位和OCR能力出色,你的昇腾端侧“新甜点”来了
量子位· 2026-01-05 05:00
Core Viewpoint - Huawei has launched the open-source model openPangu-VL-7B, targeting key scenarios in edge deployment and personal development, showcasing its lightweight and high-performance capabilities [3][24]. Group 1: Model Features and Performance - The openPangu-VL-7B model is designed for various terminal scenarios, excelling in tasks such as image information extraction, document understanding, video analysis, and object localization [2][7]. - The model achieves a latency of only 160 milliseconds for single-image inference on a single Ascend Atlas 800T A2 card, enabling real-time inference at 5 FPS, with a training phase MFU of 42.5% [4]. - During pre-training, the model completed over 3 trillion tokens in stable training, providing valuable practical references for developers using Ascend clusters [5]. Group 2: Benchmarking and Comparison - In various core tasks, openPangu-VL-7B outperforms other models of similar scale, demonstrating strong overall capabilities [7]. - The model's performance in benchmarks includes: - General Visual Question Answering (MMBenchyl.I_DEV: 86.5) - OCR & Document Understanding (OCRBench: 907) - Video Understanding (MVBench: 74.0) [8]. Group 3: Technical Innovations - The model features a high-performance visual encoder optimized for Ascend hardware, achieving a 15% throughput improvement over traditional GPU-optimized encoders [15]. - A mixed training scheme using "weighted per-sample loss + per-token loss" addresses learning balance across varying sample lengths, enhancing the model's understanding of both long and short responses [17][19]. - The model employs a unique positioning data format that improves accuracy and efficiency in visual localization tasks [20][21]. Group 4: Market Implications - The open-source nature of openPangu-VL-7B is a significant advantage for Ascend users, providing a lightweight, high-performance, and versatile multimodal model that enriches the Ascend ecosystem and stimulates innovation [24].
融资35亿后,Kimi神秘模型现身竞技场
量子位· 2026-01-05 05:00
Core Viewpoint - The emergence of a new model named Kiwi-do from Kimi, which is speculated to be a significant player in the large model arena, especially with its upcoming release and potential capabilities in multi-modal applications [1][19]. Group 1: Model Development and Performance - Kiwi-do is suggested to be linked to Kimi's previously mentioned K2-VL model, with indications that it has successfully passed the Visual Physics Comprehension Test (VPCT), showcasing its ability to solve complex visual tasks [15][17]. - The model's performance in SVG drawing tasks has been compared to K2-Thinking, revealing distinct differences in output quality [4][8]. - There is speculation that Kiwi-do may be a smaller parameter model, which could indicate a strategic approach to model development [12][13]. Group 2: Funding and Strategic Goals - Kimi recently announced a $500 million (approximately 3.5 billion RMB) Series C funding round, led by IDG, with participation from major investors like Alibaba and Tencent, resulting in a post-money valuation of $4.3 billion [21][22]. - The funds raised will be utilized to aggressively expand GPU resources to accelerate the training and development of the K3 model, with a long-term goal of becoming a leading AGI company [24][25]. - Kimi's approach to financing differs from other companies in the sector, as it is not currently pursuing an IPO, focusing instead on private market funding to support its growth strategy [27][28]. Group 3: Market Position and Future Outlook - Kimi aims to leverage its funding to enhance computational capabilities, which are critical in the large model industry, where operational costs are substantial [25][26]. - The company plans to time its IPO strategically in the future as a means to further accelerate its AGI ambitions [29]. - The K3 model is expected to achieve a significant leap in pre-training performance, aiming to match world-leading models and enhance user experience through innovative training techniques [32].
「AI 100」榜单启动招募,AI产品“年会”不能停丨量子位智库
量子位· 2026-01-05 05:00
Core Insights - The article discusses the emergence of numerous keywords in the AI product sector in China by 2025, highlighting the rapid evolution and innovation in AI technologies and applications [4]. - The "AI 100" list by Quantum Bit Think Tank aims to evaluate and recognize the top AI products that represent China's AI capabilities, focusing on both current leaders and future potential [12]. Group 1: AI 100 List Overview - The "AI 100" list is divided into three main categories: "Flagship AI 100," "Innovative AI 100," and the top three products in ten popular sub-sectors [6]. - The "Flagship AI 100" will focus on the strongest AI products of 2025, emphasizing those that demonstrate significant technological breakthroughs and practical value [7]. - The "Innovative AI 100" aims to identify emerging products in 2025 that have the potential to lead industry changes in 2026 [8]. Group 2: Sub-sector Focus - The ten sub-sectors for the top three product nominations include AI Browser, AI Agent, AI Smart Assistant, AI Workbench, AI Creation, AI Education, AI Healthcare, AI Entertainment, Vibe Coding, and AI Consumer Hardware [9]. Group 3: Application and Evaluation Criteria - The evaluation of the "AI 100" list combines quantitative and qualitative assessments, focusing on user data such as user scale, growth, activity, and retention, as well as hardware product shipment volumes [13]. - Qualitative assessments consider long-term development potential through expert evaluations and user surveys, examining factors like underlying technology, market space, functionality, monetization potential, team background, and growth speed [13].
宇树IPO搁浅传闻满天飞,王兴兴:别当真,也不用和外人解释
量子位· 2026-01-05 03:22
Core Viewpoint - The company, Yushu Technology, has clarified that it has not applied for a "green channel" for its A-share listing and that its listing process is proceeding normally despite rumors to the contrary [2][10][11]. Group 1: Clarification on Listing Rumors - Yushu Technology responded to media reports claiming that its A-share listing green channel had been halted, stating that these reports were misleading and damaging to its reputation [10][12]. - The company emphasized that it has not applied for a "green channel" and that its listing work is progressing as planned [10][11]. - Yushu Technology has taken steps to report the misleading information to relevant authorities and reserves the right to pursue legal action against those spreading false reports [10][11]. Group 2: Recent Developments - On January 4, Yushu Technology released a training video of its humanoid robot H2, showcasing its capabilities, which coincidentally occurred during the discussion of its listing [3][4][15]. - The video featured a character resembling the company's founder, Wang Xingxing, which sparked discussions among viewers [5][6]. - Following the release of the video, the misleading reports regarding the green channel were subsequently taken down [17]. Group 3: Listing Timeline - Yushu Technology submitted its counseling registration materials on July 8, 2025, with CITIC Securities as the counseling institution [18]. - The company announced on September 2, 2025, that it was actively preparing for its initial public offering (IPO), with plans to submit listing application documents between October and December 2025 [19]. - The company completed its IPO counseling work on November 15, 2025, and intends to apply for an IPO in the domestic market [25][26].
字节Seed:大概念模型来了,推理的何必是下一个token
量子位· 2026-01-04 11:00
henry 发自 凹非寺 量子位 | 公众号 QbitAI LLM的下一个推理单位,何必是Token? 刚刚,字节Seed团队发布最新研究—— DLCM(Dynamic Large Concept Models) 将大模型的推理单位从token(词) 动态且自适应地推到了concept(概念)层级。 DLCM通过 端到端地方式学习语义边界,动态地将Token序列分割成概念,在压缩后的概念空间中进行深度推理,并借助因果交叉注意力将 概念级推理结果重构为Token级预测 。 由此,传统LLM中基于均匀、冗余Token信息密度的计算分配,被转化为面向概念的动态推理与自适应算力分配。 在以推理为主的基准任务上,DLCM在将推理阶段FLOPs降低 34% 的同时,还将平均准确率提升了 2.69% 。 这也意味着,大模型的推理效率并不必然依赖更密集的Token级计算,而可以通过更高层级的语义组织来获得。 接下来,我们具体来看。 分层的下一token预测框架 如上所说,DLCM的核心在于学习动态的Token-概念映射,实现了计算资源的自适应分配。 接下来,在 动态分割 阶段,模型基于Token级表示,计算相邻Token之间 ...
MIT新论文:2026推理模型过时了,“套娃模型”当立
量子位· 2026-01-04 09:06
Core Viewpoint - The article discusses the emergence of a new paradigm in language models called the "Recursive Language Model" (RLM), which significantly improves the handling of long texts and reduces costs compared to traditional models like GPT-5 [3][5][23]. Group 1: RLM Overview - The RLM introduces a novel approach by storing text in a code environment and allowing the model to write programs that recursively call itself to process the text [5][9]. - This method decouples the length of input data from the model's context window size, enabling the processing of text limited only by physical memory rather than the constraints of the Transformer architecture [10][12]. Group 2: Performance Metrics - RLM has demonstrated the ability to effectively handle up to 10 million tokens, surpassing the context window of leading models like GPT-5 by two orders of magnitude [23]. - In various benchmark tests, RLM outperformed traditional models in complex tasks, achieving F1 scores of 58.00% and 23.11% in OOLONG and OOLONG-Pairs tests, respectively, while traditional models scored below 0.1% [27]. Group 3: Cost Efficiency - RLM's approach allows for selective reading of relevant text segments, leading to a significant reduction in operational costs. For instance, the average cost for RLM in the BrowseComp-Plus benchmark was only $0.99, compared to $1.50 to $2.75 for GPT-5 [29][31]. - This cost efficiency indicates that RLM can maintain performance while controlling inference costs, making it a viable option for large-scale applications involving long texts [32].
OpenAI首款硬件定型为笔!网友:就叫oPen吧
量子位· 2026-01-04 07:25
Core Viewpoint - OpenAI's first AI hardware product is an "AI pen," which exceeds expectations and aims to enhance user interaction with AI technology [1][6][12]. Group 1: Product Features - The AI pen is designed to facilitate two-way communication with ChatGPT through paired devices like smartphones [4][10]. - It is described as being similar in size to an iPod Shuffle, weighing approximately 10-15 grams [7]. - The pen is expected to run OpenAI's custom models locally, converting handwritten content into text and allowing users to sync this information with ChatGPT for further inquiries [10][11]. Group 2: Design and Development - The pen's design involves collaboration with Jony Ive, former Chief Design Officer at Apple, indicating a focus on professional design expertise [3][13]. - OpenAI's acquisition of Jony Ive's hardware company for approximately $6.5 billion last year marks a significant step in its hardware development strategy [14]. Group 3: Strategic Implications - The choice of a pen as the first hardware product reflects OpenAI's long-term vision to create a seamless AI experience that minimizes distractions and enhances user interaction [19][22]. - This product aims to fill a gap in the current ecosystem, reducing reliance on major platforms like Apple and Google, and potentially opening new revenue streams through hardware and services [25][27].
量子位编辑作者招聘
量子位· 2026-01-04 05:21
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. AI Industry Direction - Responsibilities include monitoring innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as producing accessible interpretations of cutting-edge research and technical reports from major conferences [6][7]. - The company offers a dynamic work environment, opportunities for personal influence, and professional mentorship for newcomers [6]. AI Finance Direction - This role focuses on venture capital and financial reporting within the AI sector, tracking capital movements in the industry and producing analyses of investment trends and company strategies [9]. AI Product Direction - Responsibilities involve assessing AI applications and hardware, writing in-depth evaluations of new products, and engaging with entrepreneurs and experts in the field [10]. Company Growth and Impact - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily readership exceeding 2 million [12].
「AI 100」榜单启动招募,AI产品“年会”不能停丨量子位智库
量子位· 2026-01-04 05:21
Core Insights - The article discusses the emergence of numerous keywords in the AI product sector by 2025, highlighting transformative AI products that are reshaping the industry [4] - The "AI 100" list by Quantum Bit Think Tank aims to evaluate and recognize the top AI products in China, reflecting the current landscape and future trends in AI [4][12] Group 1: AI 100 List Overview - The "AI 100" list is divided into three main categories: "Flagship AI 100," "Innovative AI 100," and the top three products in ten popular sub-sectors [6] - The "Flagship AI 100" will focus on the strongest AI products of 2025, showcasing those that have achieved significant technological breakthroughs and practical application value [7] - The "Innovative AI 100" aims to identify products that are expected to emerge in 2026, representing cutting-edge AI technology and potential industry disruptors [8] Group 2: Sub-sector Focus - The ten sub-sectors for the top three products include AI Browser, AI Agent, AI Smart Assistant, AI Workbench, AI Creation, AI Education, AI Healthcare, AI Entertainment, Vibe Coding, and AI Consumer Hardware [9] - This categorization is designed to provide a more precise reflection of development trends within each specific field [9] Group 3: Application and Evaluation - The evaluation of the "AI 100" list employs a dual assessment system combining quantitative and qualitative measures, focusing on user data and expert evaluations [13] - Quantitative metrics include user scale, growth, activity, and retention, while qualitative assessments consider long-term potential, technology, market space, and user experience [13]