Seek .(SKLTY)
Search documents
DeepSeek发布DeepSeek-OCR 2
Mei Ri Jing Ji Xin Wen· 2026-01-27 06:15
每经AI快讯,1月27日消息,DeepSeek发布全新DeepSeek-OCR2模型,采用创新的DeepEncoder V2方 法,让AI能够根据图像的含义动态重排图像的各个部分,而不再只是机械地从左到右扫描。这种方式 模拟了人类在观看场景时所遵循的逻辑流程。最终,该模型在处理布局复杂的图片时,表现优于传统的 视觉-语言模型,实现了更智能、更具因果推理能力的视觉理解。 ...
DeepSeek发布DeepSeek-OCR 2,AI能够以与人类相同的逻辑顺序“看”一张图片
Hua Er Jie Jian Wen· 2026-01-27 05:52
风险提示及免责条款 市场有风险,投资需谨慎。本文不构成个人投资建议,也未考虑到个别用户特殊的投资目标、财务状况或需要。用户应考虑本文中的任何 意见、观点或结论是否符合其特定状况。据此投资,责任自负。 DeepSeek发布全新DeepSeek-OCR 2模型,采用创新的DeepEncoder V2方法,让AI能够根据图像的含义 动态重排图像的各个部分,而不再只是机械地从左到右扫描。这种方式模拟了人类在观看场景时所遵循 的逻辑流程。最终,该模型在处理布局复杂的图片(如文档或图表)时,表现优于传统的视觉-语言模 型,实现了更智能、更具因果推理能力的视觉理解。 ...
继“DeepSeek时刻”之后,是什么让“中国时刻”持续刷屏?
Sou Hu Cai Jing· 2026-01-26 16:06
Group 1 - DeepSeek's R1 model was a significant breakthrough in AI, marking the beginning of China's technological innovation in 2025, referred to as the "DeepSeek moment" [1] - In the field of basic research, China made notable advancements, including breakthroughs in renewable energy and genetics, recognized in the "Top Ten Scientific Breakthroughs of 2025" by the journal Science [3] - Chinese open-source AI models, such as DeepSeek and Qianwen, accounted for 17% of global downloads, surpassing the US and leading the world [3] Group 2 - The determination for high-level technological self-reliance in China has driven significant innovation, particularly in response to US restrictions on advanced chips [5] - Chinese companies like Huawei, Alibaba, and Baidu are making strides in chip technology, with predictions that local firms will capture 80% of the AI chip market by 2026 [3][5] - China has a substantial pool of research talent, with over 30,000 AI researchers, three times that of the US, and a strong presence in high-impact scientific publications [5] Group 3 - The optimistic sentiment from both domestic and international investors has fueled strong performance in Chinese tech stocks, particularly in AI, semiconductors, and robotics [5] - Despite significant progress, challenges remain in achieving full technological independence, with some critical issues still unresolved [6] - The focus on innovation and industrial development presents new opportunities for China's tech sector to continue advancing [6]
数据:Seeker 代币 SKR 累计交易量已超过 2 亿美元
Xin Lang Cai Jing· 2026-01-26 14:54
Group 1 - The core point of the article highlights that since the Solana Mobile airdrop, the Seeker token SKR has achieved a cumulative trading volume exceeding 200 million USD [1] - Approximately 85% of the total airdrop amount has been claimed [1] - The on-chain trading volume of SKR is primarily conducted within Meteora, accounting for 57.4% of the total [1]
PriceSeek重点提醒:LLDPE现货价格大幅上调
Xin Lang Cai Jing· 2026-01-26 11:09
Core Viewpoint - The price of LLDPE in East China has increased significantly, indicating a potential supply tightness or increased demand, which is favorable for spot prices [2][5]. Price Movement - On January 26, LLDPE7042 was reported at 7050 CNY/ton, up by 200 CNY/ton, while LLDPE7050 was at 7100 CNY/ton, also up by 200 CNY/ton [1][4]. Market Sentiment - The closing price of the 2605 contract for polyethylene futures on the Dalian Commodity Exchange was 6865 CNY/ton, an increase of 116 CNY/ton, with a trading volume of 719,797 lots and an increase in open interest by 3,532 lots, indicating strong bullish sentiment in the market [2][5]. Price Forecast - The rise in spot prices is expected to reinforce upward expectations for futures prices, supporting future price trends [2][5]. Pricing Model - The benchmark price from the business society is derived from big data and pricing models, serving as a trading guide for settlement prices on specified dates or average prices over specified periods [2][5].
DeepSeek-R1推理智能从哪儿来?谷歌新研究:模型内心多个角色吵翻了
3 6 Ke· 2026-01-26 09:14
Core Insights - The reasoning capabilities of large models have significantly improved over the past two years, particularly in complex tasks involving mathematics, logic, and multi-step planning, with models like OpenAI's o series, DeepSeek-R1, and QwQ-32B showing a clear advantage over traditional instruction-tuned models [1] - Recent research indicates that the enhancement in reasoning abilities is not merely due to increased computational steps but rather stems from models simulating a complex, multi-agent interaction structure during reasoning, referred to as a "society of thought" [2] Group 1: Reasoning Mechanisms - The study reveals that reasoning models like DeepSeek-R1 and QwQ-32B exhibit significantly higher diversity in perspectives compared to baseline and instruction-tuned models, activating a broader range of features related to personality and expertise, leading to more substantial conflicts among these features [3] - The internal structure of these multi-agent-like interactions manifests through dialogic behaviors, including question-answer sequences, perspective shifts, and the integration of conflicting viewpoints, which collectively enhance cognitive strategies and explain the accuracy advantages in reasoning tasks [3][4] Group 2: Social Interaction and Cognitive Strategies - The findings suggest that the social organization of thought aids in more efficient exploration of solution spaces, with Google proposing a new research direction that systematically leverages "collective intelligence" through agent organization [4] - Controlled reinforcement learning experiments indicate that even when accuracy is the sole reward signal, base models spontaneously increase dialogic behaviors, and introducing conversational scaffolding significantly accelerates the enhancement of reasoning capabilities compared to non-tuned base models [3][4] Group 3: Dialogic Behaviors and Emotional Roles - The study identifies four types of dialogic behaviors in reasoning trajectories, including question-answer sequences, perspective shifts, viewpoint conflicts, and viewpoint reconciliation, which are crucial for enhancing reasoning accuracy [10][11] - Analysis of social emotional roles within reasoning trajectories shows that models like DeepSeek-R1 engage in more reciprocal emotional role structures, demonstrating both positive and negative emotional interactions, unlike instruction-tuned models that primarily exhibit unidirectional guidance [16][17] Group 4: Experimental Results and Implications - The results confirm that even with similar reasoning trajectory lengths, reasoning models display a higher frequency of dialogic behaviors and social emotional roles, particularly in complex tasks, indicating that dialogic features enhance reasoning performance [13][18] - Experiments show that positively guiding dialogic features can nearly double the accuracy in reasoning tasks, while negative guidance significantly suppresses these behaviors, highlighting the importance of dialogic interactions in effective problem-solving [18][20]
“DeepSeek-V3基于我们的架构打造”,欧版OpenAI CEO逆天发言被喷了
3 6 Ke· 2026-01-26 07:44
Core Viewpoint - The discussion centers around the competitive landscape in the AI field, particularly focusing on the contrasting approaches of Mistral and DeepSeek in developing sparse mixture of experts (MoE) models, with Mistral's CEO acknowledging China's strong position in AI and the significance of open-source models [1][4]. Group 1: Company Perspectives - Mistral's CEO, Arthur Mensch, claims that open-source models are a strategy for progress rather than competition, highlighting their early release of open-source models [1]. - The recent release of DeepSeek-V3 is built on Mistral's proposed architecture, indicating a collaborative yet competitive environment in AI development [1][4]. - There is skepticism among the audience regarding Mistral's claims, with some suggesting that Mistral's recent models may have borrowed heavily from DeepSeek's architecture [4][13]. Group 2: Technical Comparisons - Both DeepSeek and Mistral's Mixtral focus on sparse MoE systems, aiming to reduce computational costs while enhancing model capabilities, but they differ fundamentally in their approaches [9]. - Mixtral emphasizes engineering principles, showcasing the effectiveness of a robust base model combined with mature MoE technology, while DeepSeek focuses on algorithmic innovation to address issues in traditional MoE systems [9][12]. - DeepSeek introduces a fine-grained expert segmentation approach, allowing for more flexible combinations of experts, which contrasts with Mixtral's flat knowledge distribution among experts [11][12]. Group 3: Community Reactions - The community has reacted critically to Mistral's statements, with some users expressing disbelief and pointing out the similarities between Mistral's and DeepSeek's architectures [2][17]. - There is a sentiment that Mistral, once a pioneer in the open-source AI space, is now perceived as having lost its innovative edge, with DeepSeek gaining more influence in the sparse MoE and MLA technologies [14][17]. - The competitive race for foundational models is expected to continue, with DeepSeek reportedly targeting significant releases in the near future [19].
DeepSeek最新论文解读:mHC如何用更少的钱训练出更强的模型?——投资笔记第243期
3 6 Ke· 2026-01-26 07:38
Core Insights - DeepSeek has released a significant paper on Manifold-Constrained Hyper-Connections (mHC), focusing on the fundamental issue of how information flows stably through ultra-deep networks in large models, rather than on model parameters, data volume, or computational power [2] Group 1: Residual Connections and Their Limitations - The concept of residual connections, introduced by Kaiming He’s team in 2015, is a milestone in AI development, allowing deeper neural networks by addressing the vanishing gradient problem [3] - Prior to residual connections, neural networks were limited to depths of 20-30 layers due to the exponential decay of gradients, which hindered effective feature learning [3][4] - Residual connections introduced a "shortcut" for signal transmission, enabling the depth of trainable networks to increase from tens to hundreds or thousands of layers, forming the structural foundation of modern deep learning [4] Group 2: Introduction of Hyper-Connections - Hyper-Connections emerged as a solution to the limitations of residual connections, allowing multiple pathways for information transfer within a model, akin to a relay race with multiple runners [6][7] - This approach enables information to be distributed across multiple parallel channels, allowing for dynamic weight allocation during training, enhancing the model's ability to handle complex, multi-source information [6][7] Group 3: Challenges with Hyper-Connections - Hyper-Connections face a critical flaw: instability due to excessive freedom in information flow, which can lead to imbalances in the model's internal information flow [9] - The training process of models using Hyper-Connections can exhibit high volatility and loss divergence, indicating a lack of stability in information transmission [9] Group 4: The Solution - mHC - mHC, or Manifold-Constrained Hyper-Connections, introduces a crucial constraint to Hyper-Connections by employing a double stochastic matrix, ensuring that information is redistributed without amplification [11] - This constraint prevents both signal explosion and signal decay, maintaining a stable flow of information throughout the network [13] - The implementation of mHC enhances training stability and performance, with only a 6.7% increase in training time, which is negligible compared to the significant cost savings in computational resources and debugging time [13][14] Group 5: Implications for Future AI Development - mHC strikes a new balance between stability and efficiency, reducing computational costs by approximately 30% and shortening product iteration cycles [14] - It supports the development of larger models, addressing the stability bottleneck in scaling to models with hundreds of billions or trillions of parameters [16] - The framework of mHC demonstrates that "constrained freedom" is more valuable than "complete freedom," suggesting a shift in AI architecture design from experience-driven to theory-driven approaches [16]
DeepSeek——少即是多
2026-01-26 02:49
Summary of DeepSeek Conference Call Company and Industry Overview - **Company**: DeepSeek - **Industry**: Artificial Intelligence (AI) and Semiconductor Equipment in China Key Points and Arguments 1. **Engram Module Launch**: DeepSeek has introduced the Engram module, which decouples storage from computation, reducing reliance on High Bandwidth Memory (HBM) and lowering infrastructure costs. This innovation aims to alleviate bottlenecks in AI computing in China and suggests that future AI competition may focus on more efficient hybrid architectures rather than larger models [1][2][3] 2. **Efficiency Improvements**: The Engram module enhances the efficiency of large language models by implementing "conditional memory," which allows for better utilization of GPU resources. This decoupling of static memory from computation is expected to improve the performance of AI systems while reducing the need for expensive HBM [1][9][10] 3. **Infrastructure Cost Dynamics**: The findings indicate that infrastructure costs may shift from GPU to storage, as medium computational configurations may offer better cost-effectiveness than pure GPU expansions. The AI inference capability is expected to improve beyond knowledge growth, highlighting the importance of storage value beyond just computation [2][3][10] 4. **Next Generation Model**: DeepSeek's upcoming V4 model will utilize the Engram memory architecture, potentially achieving significant advancements in code generation and inference. The model is expected to run on consumer-grade hardware, such as the RTX 5090, and will be closely monitored for its performance against key benchmarks [2][3][10] 5. **Investment Opportunities**: The report highlights potential investment opportunities in the Chinese semiconductor equipment sector, particularly focusing on companies like Northern Huachuang (target price: RMB 514.2), Zhongwei Company (target price: RMB 364.32), and Changdian Technology (target price: RMB 49.49) [3][24][25] Additional Important Insights 1. **Performance Comparison**: Despite facing stricter constraints in advanced computing and hardware acquisition, Chinese AI models have rapidly closed the performance gap with leading models like ChatGPT 5.2. This progress is attributed to a focus on efficiency-driven innovations rather than sheer computational expansion [8][14] 2. **Long-term Implications**: The architecture developed by DeepSeek may lead to a more cost-effective, scalable, and adaptable AI ecosystem in China, potentially impacting global competitors by reducing the marginal costs of high-level intelligence and decreasing reliance on unlimited computational expansion [14][16] 3. **Engram's Unique Approach**: Engram's design allows for a more efficient memory usage model, significantly lowering the demand for HBM. This approach enhances the core transformer model without increasing FLOP or parameter scale, thereby improving overall system efficiency [11][18] 4. **Testing Results**: Tests on a 27 billion parameter model have shown that Engram outperforms in several benchmark tests, particularly in long-context processing, which is crucial for enhancing AI practicality [16][18] 5. **Strategic Positioning**: DeepSeek's advancements represent a strategic response to geopolitical and supply chain constraints, emphasizing algorithmic and system-level innovations over direct hardware competition [16][18] This summary encapsulates the critical insights from the conference call regarding DeepSeek's innovations, market positioning, and the broader implications for the AI and semiconductor industries in China.
AI周报丨DeepSeek新模型曝光;马斯克炮轰ChatGPT诱导自杀
Di Yi Cai Jing· 2026-01-25 01:31
Group 1 - DeepSeek has revealed a new model identifier "MODEL1" in its FlashMLA code, suggesting it may be nearing completion or deployment, potentially as a new architecture distinct from existing models [1] - Elon Musk criticized ChatGPT for being linked to multiple suicide cases, while OpenAI's Sam Altman acknowledged the complexities of operating a large AI platform and highlighted the safety concerns surrounding AI technologies [2] - Wang Xiaochuan responded to concerns about AI in healthcare, advocating for a model where AI assists doctors rather than replacing them, emphasizing the importance of patient benefits [3] Group 2 - OpenAI's API business generated over $1 billion in annual recurring revenue last month, with projections indicating a significant increase in annual revenue to over $20 billion by 2025 [4] - Baidu has established a new personal superintelligence business group, merging its document and cloud storage divisions, which is expected to enhance AI application capabilities [6] - NVIDIA's CEO highlighted three major breakthroughs in AI models over the past year, including the emergence of agentic AI and advancements in open-source models [7] Group 3 - Sequoia Capital is reportedly investing in AI unicorn Anthropic, which is raising over $25 billion in funding, potentially doubling its valuation to around $350 billion [8] - Meta's new AI lab has delivered its first key models, although significant work remains before these technologies are fully operational for internal and consumer use [9] - Musk's X platform has open-sourced its recommendation algorithm, which relies heavily on AI to customize user content [10][11] Group 4 - Suiruan Technology reported significant losses exceeding 4 billion yuan over three years, with a high dependency on sales to Tencent [12] - Moore Threads anticipates a narrowing of losses in the upcoming year, projecting revenues of 1.45 to 1.52 billion yuan for 2025 [13] - Yushu Technology announced that it shipped over 5,500 humanoid robots last year, surpassing previous market estimates [14] Group 5 - The "Qiming Plan" project has been launched to establish global consensus on AI safety measures, aiming to balance opportunities and risks associated with rapid AI development [15]