Workflow
大语言模型
icon
Search documents
DeepSeek的终极野心:把大语言模型的基本语言都改造成图像
3 6 Ke· 2025-10-21 12:52
Core Insights - DeepSeek has open-sourced DeepSeek-OCR, an OCR model that achieves state-of-the-art results on benchmarks like OmniDocBench [1] - The motivation behind entering the OCR field is to address the computational bottleneck of long context processing in large language models (LLMs) [4][6] - The paper proposes that text information can be efficiently compressed through optical 2D mapping, allowing visual language models (VLMs) to decompress original information from images [4][6] Group 1: Long Context Processing - The pursuit of longer context in LLMs has led to a competitive arms race, with token windows expanding from thousands to millions [7] - The core limitation arises from the attention mechanism in the Transformer architecture, where computational complexity and memory usage grow quadratically with sequence length [7] - DeepSeek-AI's engineers propose a fundamental question: can the number of tokens be compressed rather than just optimizing attention calculations? [7][10] Group 2: Visual Tokens vs. Text Tokens - Visual tokens are the basic units of information processed by visual models, while text tokens are used by LLMs [8] - A 1024x1024 image can be divided into 4096 visual tokens, significantly reducing the number of tokens needed compared to text representation [9] - The understanding that visual modalities can serve as efficient compression mediums for text information led to the creation of DeepSeek-OCR [9] Group 3: DeepEncoder and Compression Techniques - DeepSeek-OCR is essentially a proof of concept for an "optical compression-decompression" system [10] - The DeepEncoder, a key innovation, is designed to handle high-resolution inputs while producing minimal visual tokens [11][12] - The architecture consists of three stages: a local detail processor, a compression module, and a global attention layer [14][16] Group 4: Performance Metrics - Experimental results show a 10.5x compression rate with 64 visual tokens decoding 600-700 text tokens, achieving an OCR accuracy of 96.5% [17][18] - At a 20x compression rate, the model maintains around 60% accuracy while decoding over 1200 text tokens [17][18] - DeepSeek-OCR outperforms existing models like GOT-OCR2.0 and MinerU2.0 in terms of performance and token efficiency [19][20] Group 5: Future Vision and Memory Simulation - The team aims to simulate human memory's forgetting mechanism, which naturally prioritizes relevant information while compressing less important details [25][27] - The multi-resolution design of DeepSeek-OCR provides a technical foundation for managing memory in a way that mimics human cognitive processes [29][30] - The ultimate goal is to create a system that balances information retention and computational efficiency, potentially leading to a new paradigm in AI memory and input systems [32][35]
从大脑解码 AI,对话神经网络先驱谢诺夫斯基
晚点LatePost· 2025-10-21 03:09
Core Insights - The article discusses the evolution of artificial intelligence (AI) and its relationship with neuroscience, highlighting the contributions of key figures like Terrence Sejnowski and Geoffrey Hinton in the development of deep learning and neural networks [3][4][5]. Group 1: Historical Context and Contributions - The collaboration between Sejnowski and Hinton in the 1980s led to significant advancements in AI, particularly through the introduction of the Boltzmann machine, which combined neural networks with probabilistic modeling [3][4]. - Sejnowski's work laid the foundation for computational neuroscience, influencing various AI algorithms such as multi-layer neural networks and reinforcement learning [5][6]. Group 2: The Impact of Large Language Models - The emergence of ChatGPT and other large language models has transformed perceptions of AI, demonstrating the practical value of neural network research [4][6]. - Sejnowski's recent publications, including "The Deep Learning Revolution" and "ChatGPT and the Future of AI," reflect on the journey of AI from its inception to its current state and future possibilities [6][10]. Group 3: Collaboration with AI - Sejnowski utilized ChatGPT in writing his book "ChatGPT and the Future of AI," highlighting the model's ability to summarize and simplify complex concepts for broader audiences [9][10]. - The interaction between users and large language models is described as a "mirror effect," where the quality of responses depends on the user's input and understanding [11][12]. Group 4: Neuroscience and AI Memory - Current AI models exhibit limitations in memory retention, akin to human amnesia, as they lack long-term memory capabilities [13][14]. - The article draws parallels between human memory systems and AI, emphasizing the need for advancements in understanding the brain to improve AI memory functions [13][14]. Group 5: Future Directions in AI and Neuroscience - The development of neuromorphic chips, which mimic the functioning of neurons, presents a potential shift in AI technology, promising lower energy consumption and higher performance [19][20]. - The article suggests that the future of AI may involve a transition from digital to analog computing, similar to the evolution from gasoline to electric vehicles [20][21]. Group 6: The Role of Smaller Models - There is a growing debate on the effectiveness of smaller, specialized models compared to larger ones, with smaller models being more practical for specific applications [35][36]. - The quality of data is emphasized as a critical factor in the performance of AI models, with smaller models having the potential to reduce biases and errors [36][37]. Group 7: Regulatory Perspectives - The article discusses the importance of self-regulation within the scientific community to manage AI risks, rather than relying solely on government intervention [30][34]. - It highlights the need for a balanced approach to AI development, weighing the benefits against potential risks while fostering innovation [30][34].
字节Seed架构再调整 朱文佳转向吴永辉汇报
Xi Niu Cai Jing· 2025-10-21 02:22
Group 1 - The reporting structure for Zhu Wenjia, the former head of ByteDance's Seed large model team, has changed from CEO Liang Rubo to the current head of Seed, Wu Yonghui [2] - Earlier this year, ByteDance recruited Wu Yonghui from Google, where he was the Vice President of Research at DeepMind, leading to structural adjustments within the large model team [2] - Several algorithm and technology leaders who previously reported to Zhu Wenjia have shifted to report to Wu Yonghui, while Zhu Wenjia has transitioned to focus on model applications [2] Group 2 - The Seed team has undergone multiple adjustments, including the dismissal of Qiao Mu, the head of the large language model, due to personal misconduct [2] - Yang Jianchao, the head of the visual large model, has announced a break, and AiLab director Li Hang has retired but has been rehired [2] - ByteDance's Flow division has also experienced significant organizational changes, with Zhao Qi moving to the Spring product department and reporting directly to Zhu Jun [2]
中国总会计师协会财务管理专业委员会2025年秋季论坛成功举办
Xin Jing Bao· 2025-10-21 02:08
Core Insights - The forum focused on the transformation of financial management in the era of artificial intelligence, emphasizing the shift from traditional accounting to value creation and proactive risk management [1][2]. Group 1: Forum Overview - The "2025 Autumn Forum" was successfully held in Beijing, organized by the Chinese Institute of Certified Public Accountants, with a theme centered on "Deep Language Models (DeepSeek) and Penetrative Financial Management" [1]. - Keynote speeches highlighted the importance of deep learning models in reshaping financial management practices across various sectors, including state-owned enterprises and financial institutions [2][3]. Group 2: Key Presentations - Experts from different fields shared insights on the application of technology in financial risk management, with a focus on proactive measures rather than mere compliance [3][4]. - The presentations included practical applications of DeepSeek in financial scenarios such as intelligent reconciliation, risk warning, and cash flow forecasting [3][4]. Group 3: Roundtable Discussion - A roundtable discussion addressed the challenges and opportunities in AI-driven financial management, emphasizing the need for high-quality data and skilled professionals [5][6]. - Participants discussed the significance of contract-based cash flow management in enhancing overall funding efficiency within organizations [6]. Group 4: Future Directions - The forum concluded with a call for continued collaboration among industry peers to leverage deep learning technologies for greater value creation in financial management [7]. - Ningbo Bank expressed its commitment to fostering partnerships and developing a new ecosystem for intelligent finance in the era of big models [7].
刚刚,DeepSeek重要突破,大模型上下文紧箍咒打破
3 6 Ke· 2025-10-20 23:22
Core Insights - DeepSeek has introduced a novel technology path in the competition of large language models by open-sourcing the DeepSeek-OCR model, which proposes the concept of "Contextual Optical Compression" for efficient information compression through text-to-image conversion [1][8]. Group 1: Model Performance and Capabilities - The feasibility of DeepSeek-OCR has been validated, achieving a decoding accuracy of 97% at a 10x compression ratio, indicating near-lossless compression, while maintaining approximately 60% accuracy at a 20x compression ratio [3][21]. - DeepSeek-OCR can express similar textual content using fewer tokens by converting text tokens into visual tokens, providing a new approach to address the high computational costs associated with processing long texts in large language models [6][11]. - In practical applications, DeepSeek-OCR surpassed GOT-OCR 2.0 using only 100 visual tokens and outperformed MinerU 2.0 with less than 800 visual tokens, demonstrating its efficiency [6][23]. Group 2: Technical Architecture - The architecture of DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight mixture of experts language decoder [12][18]. - DeepEncoder employs a dual-structure design combining local and global attention to achieve high-fidelity visual understanding, significantly reducing the number of vision tokens generated from document images [14][18]. Group 3: Data and Training - DeepSeek-OCR's training process is relatively straightforward, involving independent training of DeepEncoder and the complete DeepSeek-OCR model, utilizing a large dataset for effective learning [20][21]. - The model has been trained on a diverse dataset that includes OCR 1.0 and OCR 2.0 data, general visual data, and pure text data, ensuring robust performance across various document types [25][36]. Group 4: Application and Future Directions - DeepSeek-OCR demonstrates capabilities in deep parsing, allowing it to recognize and extract structured information from various document types, including financial reports and scientific literature [24][29]. - The research team plans to further explore the integration of digital and optical text pre-training methods and evaluate the performance of optical compression in real long-text environments, indicating a promising direction for future research [39].
斑马智行拟港股上市 中国证监会要求补充说明股权变动等事项
Zhi Tong Cai Jing· 2025-10-20 07:09
Core Viewpoint - The China Securities Regulatory Commission (CSRC) has requested additional information from Zhibo Zhixing regarding its equity changes and business operations as part of its overseas listing application process [1][2][3] Group 1: Equity Changes - Zhibo Zhixing is required to explain the pricing basis for its past capital increases and equity transfers, ensuring that the pricing is fair and that there are no issues with capital contributions or compliance [1] - The company must provide updates on its capital reduction and increase registration process as of August 2025, including compliance with procedures and tax payments [1] - The CSRC has asked for clarification on whether there are any uncompleted requirements regarding state-owned shareholder identification [1] Group 2: Business Operations - Zhibo Zhixing needs to detail its business scope, including value-added telecommunications services and advertising, confirming whether these activities are being conducted and if necessary licenses are obtained [2] - The company must report on the progress of its subsidiary Zhi Yun Tu's telecommunications business license and the specific activities it plans to undertake [2] - A clear explanation of the business model involving large language models is required, including whether relevant model registrations have been completed [2] Group 3: Compliance and Operations - The company must confirm whether it has developed or operates websites, apps, or other digital products, and detail the types of information content provided to third parties along with user data protection measures [2] - Updates on any ongoing litigation or arbitration cases must be provided to assess potential obstacles to the overseas listing [2] - Zhibo Zhixing is required to ensure compliance with the regulations regarding overseas issuance and listing, confirming no prohibitive circumstances exist [3] Group 4: Listing and Fundraising - The company must disclose the expected fundraising amount if the overallotment option is fully exercised [3] - Information regarding any pledges, freezes, or other rights issues related to shares held by shareholders participating in the "full circulation" must be provided [3] - Details on the regulatory procedures followed for the split listing must be outlined [3] Group 5: Company Overview - Zhibo Zhixing is identified as a supplier of intelligent cockpit solutions, focusing on transforming vehicles into interactive smart partners through its self-developed automotive operating system and AI architecture [3] - The company aims to enhance the in-car experience by enabling natural voice control and personalized cabin experiences for vehicle owners [3]
斑马智行3年1期经调整净亏损共24.8亿 "里程碑"数据降
Zhong Guo Jing Ji Wang· 2025-10-20 06:42
Core Viewpoint - The China Securities Regulatory Commission (CSRC) has requested additional documentation from Zebra Network Technology Co., Ltd. (Zebra Smart Travel) regarding its overseas listing, focusing on equity changes, business operations, and compliance with regulations [1][2][3]. Group 1: Equity Changes - The CSRC requires Zebra Smart Travel to clarify the pricing basis for past capital increases and equity transfers, ensuring fairness and compliance with capital contribution obligations [1]. - The company must also provide updates on its directed capital reduction and increase registered with the industrial and commercial authorities, including compliance with procedures and tax payments [1]. Group 2: Business Operations - Zebra Smart Travel is asked to detail its business scope, including value-added telecommunications services and advertising, confirming whether it has the necessary licenses and if it collaborates with third parties [2]. - The company needs to explain the progress of its subsidiary's telecommunications business license and its specific business plans [2]. - A clear description of the business model involving large language models is required, along with confirmation of any necessary registrations [2]. Group 3: Compliance and Legal Matters - The company must confirm whether it has any pending lawsuits or arbitration cases that could hinder its overseas listing [2]. - It is also required to ensure compliance with the regulations governing overseas securities issuance and listing [3]. Group 4: Financial Performance - Zebra Smart Travel reported revenues of RMB 8.05 billion, RMB 8.72 billion, RMB 8.24 billion, and RMB 1.36 billion for the years 2022, 2023, 2024, and the first quarter of 2025, respectively, with annual losses of RMB 8.78 billion, RMB 8.76 billion, RMB 8.47 billion, and RMB 15.82 billion [6][7]. - The adjusted net losses for the same periods were RMB 7.26 billion, RMB 7.92 billion, RMB 7.57 billion, and RMB 2.01 billion [6][7]. Group 5: Market Position and Challenges - Zebra Smart Travel's number of designated points, which are critical for future business growth, decreased from 37 to 30 in the first quarter of 2025 due to delays in internal approvals and contract signings from major clients [8].
淘天 AI 的终极目标:大象无形
晚点LatePost· 2025-10-20 03:51
Core Insights - The article discusses the evolution and potential of AI applications in the Chinese e-commerce sector, particularly focusing on Alibaba's Taobao platform and its integration of AI technologies to enhance user experience and operational efficiency [2][3][8]. Group 1: AI Application in E-commerce - The prediction of a super AI application with over 100 million DAU in China was overly optimistic, as the largest current application, Doubao, only reached 47 million DAU [2]. - The article emphasizes that the fundamental needs of users in e-commerce remain unchanged, with AI providing new methods to meet these needs [8]. - Taobao's AI initiatives focus on three main areas: improving underlying technology for better data processing, providing AI tools for merchants to enhance efficiency, and creating innovative AI-driven shopping experiences for users [3][15]. Group 2: User Experience and AI Integration - Taobao aims to integrate AI seamlessly into the user journey, allowing users to solve problems without needing to understand the underlying AI technology [7][10]. - The AI products developed by Taobao are designed to address specific user needs, such as AI fitting rooms and personalized recommendations, enhancing the overall shopping experience [9][18]. - The article highlights the importance of understanding user intent and improving product data quality to enhance search and recommendation systems [12][16]. Group 3: Operational Efficiency and Merchant Support - Taobao's AI initiatives have led to significant operational improvements, such as automating the generation of images and providing AI customer service, saving substantial costs for merchants [18]. - The platform's focus is on helping merchants reduce operational costs while improving the quality of product data, which in turn benefits the overall ecosystem [17]. - The integration of AI into various operational aspects aims to enhance efficiency and drive sales growth for merchants, ultimately benefiting the platform itself [15][17].
凯文·凯利:AI技术在中国语境下的落地与实践
Xin Lang Cai Jing· 2025-10-20 01:33
Group 1: ESG Services and Initiatives - The Sina Finance ESG Rating Center offers 14 ESG services, including information, reports, training, and consulting, to help listed companies promote ESG concepts and enhance sustainable development performance [1] - The 2025 Sustainable Global Leaders Conference will be held from October 16 to 18 in Shanghai, focusing on discussions around AI technology in the Chinese context [1] Group 2: AI in Education - Kevin Kelly expresses optimism about AI's application in education, highlighting its potential to balance educational resources and maximize efficiency, allowing students to learn at their own pace [3][4] - AI is seen as a tool to accelerate learning capabilities and expand the range of knowledge for students [4] Group 3: AI's Impact on Human Capability - Kelly believes AI enhances human capabilities rather than making people lazy, comparing it to how calculators accelerated arithmetic processes [4] - The rapid growth of AI has surprised many, particularly in its ability to enhance cognitive skills and language translation [5] Group 4: AI and Sustainable Development - AI is viewed as a significant advantage in sustainable development, with the potential to solve complex problems that human thinking cannot address [6] - Kelly emphasizes the need for ethical considerations in AI development, particularly in decision-making scenarios like self-driving cars [7] Group 5: Future of AI and Global Collaboration - The future of AI remains uncertain, with questions about whether different countries can collaborate on technology and whether AI will develop unique characteristics in different cultural contexts [8] - Kelly suggests that AI can accelerate the development of green technologies and contribute to sustainable practices [9] Group 6: ESG Investment Development - Sina Finance has launched multiple ESG innovation indices to provide investors with more options focused on corporate ESG performance [10] - The establishment of the China ESG Leaders Organization Forum aims to promote the development of ESG investment in China's asset management industry [10]
今日开课!清华团队带队梳理自动驾驶VLA学习路线:算法+实践
自动驾驶之心· 2025-10-19 23:32
Core Viewpoint - The focus of academia and industry is shifting towards VLA (Visual Language Action), which provides human-like reasoning capabilities for more reliable and safer autonomous driving [1][4]. Summary by Sections Overview of Autonomous Driving VLA - Autonomous driving VLA can be categorized into modular VLA, integrated VLA, and reasoning-enhanced VLA [1]. - Traditional perception methods like BEV (Bird's Eye View) and lane detection are becoming mature, leading to decreased attention from both academia and industry [4]. Key Content of Autonomous Driving VLA - Core components of autonomous driving VLA include visual perception, large language models, action modeling, large model deployment, and dataset creation [7]. - Cutting-edge algorithms such as Chain-of-Thought (CoT), Mixture of Experts (MoE), Retrieval-Augmented Generation (RAG), and reinforcement learning are at the forefront of this field [7]. Course Structure - The course titled "Autonomous Driving VLA and Large Model Practical Course" includes detailed explanations of cutting-edge algorithms in the three subfields of autonomous driving VLA, along with practical assignments [8]. Chapter Summaries 1. **Introduction to VLA Algorithms** - This chapter provides a comprehensive overview of VLA algorithms, their concepts, and development history, along with open-source benchmarks and evaluation metrics [14]. 2. **Algorithm Fundamentals of VLA** - Focuses on foundational knowledge of Vision, Language, and Action modules, and includes a section on deploying and using popular large models [15]. 3. **VLM as an Autonomous Driving Interpreter** - Discusses the role of VLM (Visual Language Model) in scene understanding and covers classic and recent algorithms like DriveGPT4 and TS-VLM [16]. 4. **Modular & Integrated VLA** - Explores the evolution of language models from passive descriptions to active planning components, emphasizing the direct mapping from perception to control [17]. 5. **Reasoning-Enhanced VLA** - Focuses on the trend of integrating reasoning modules into autonomous driving models, highlighting the parallel output of control signals and natural language explanations [18]. 6. **Capstone Project** - Involves practical tasks starting from network construction, allowing participants to customize datasets and fine-tune models, emphasizing hands-on experience [21]. Learning Outcomes - The course aims to advance the understanding of autonomous driving VLA in both academic and industrial contexts, equipping participants with the ability to apply VLA concepts in real-world projects [23]. Course Schedule - The course is set to begin on October 20, with a duration of approximately two and a half months, featuring offline video lectures and online Q&A sessions [24]. Prerequisites - Participants are expected to have a foundational knowledge of autonomous driving, familiarity with transformer models, reinforcement learning, and basic mathematical concepts [25].