Workflow
大语言模型
icon
Search documents
浙商证券:大语言模型技术红利驱动新一轮增长 电商平台正迎双重红利期
智通财经网· 2026-01-15 07:49
Group 1 - The core viewpoint is that the integration of AI and e-commerce is transitioning from "discriminative recommendation" to "generative recommendation (GR)", driven by the technological advantages of large language models (LLMs) [1] - The industry is overcoming the limitations of traditional deep learning recommendation models (DLRMs) through the validation of Scaling Law, leading to improved user retention and advertising conversion rates (CTR) on e-commerce platforms [1] - Generative recommendation engines utilize LLMs for matching a vast array of products, significantly enhancing recommendation effectiveness, as demonstrated by Alibaba's introduction of a large user model (LUM) [1] Group 2 - The Qianwen APP has rapidly increased its monthly active users (MAU), surpassing 100 million by January 14, 2026, and is expected to leverage Alibaba's ecosystem for further growth [2] - The AI shopping assistant Rufus from Amazon has transformed traditional search methods, allowing users to ask questions in natural language for product comparisons and recommendations, indicating a shift in e-commerce traffic entry and distribution mechanisms [2] Group 3 - Alibaba-W (09988) is a key recommendation, with additional focus on industry chain targets such as Focus Media (002027.SZ), Worth Buying (300785.SZ), and others [3]
软件ETF易方达(562930)连续3日获资金净流入,阿里“千问任务助理1.0”上线,AI应用商业化节奏有望提速
Xin Lang Cai Jing· 2026-01-15 03:58
Group 1 - The software ETF E Fund (562930) has seen an active trading session with a turnover of 15.53% and a transaction volume of 1.74 billion yuan as of January 15, 2026 [1] - As of January 14, 2026, the latest scale of the software ETF E Fund reached 11.25 billion yuan, with a total share of 10.39 billion, marking a new high in nearly one year [1] - The software ETF E Fund has experienced continuous net inflows over the past three days, with a maximum single-day net inflow of 397 million yuan, totaling 810 million yuan [1] Group 2 - Recent innovations in large language model architecture have been highlighted, with DeepSeek's Engram module significantly improving knowledge storage and retrieval efficiency [2] - Long-term forecasts suggest that AI applications are expected to achieve breakthroughs in both consumer and business sectors by 2026, with a focus on model and industry leader movements [2] - The "Artificial Intelligence + Manufacturing" initiative aims to launch 1,000 industrial intelligent bodies and create 500 typical application scenarios by 2027, promoting AI technology integration into production control and process optimization [2] Group 3 - The software ETF E Fund (562930) closely tracks the CSI Software Service Index, which selects 30 listed companies involved in software development and services to reflect the overall performance of the software service industry [3]
一夜200万阅读,OpenAI神同步,这项测评框架让全球顶尖LLM全翻车
3 6 Ke· 2026-01-15 01:26
这篇中国团队领衔发布的论文,已经在外网刷屏了,仅一夜阅读就达到了200万!这位MIT博士回国创业后组建的团队,拉来全球24所顶级机 构,给AI如何助力科学发现来了一剂猛药。 最近,一篇由中国团队领衔全球24所TOP高校机构发布,用于评测LLMs for Science能力高低的论文,在外网炸了! 当晚,Keras (最高效易用的深度学习框架之一)缔造者François Chollet转发论文链接,并喊出:「我们迫切需要新思路来推动人工智能走向科学创 新。」 AI领域KOL Alex Prompter分享论文核心摘要后,NBA独行侠队老板Mark Cuban跟帖转发,硅谷投资人、欧洲家族办公室、体育媒体同时涌进评论区。 仅一夜,累计阅读量逼近200万。 值得一提的是,同一时间窗里,OpenAI也发布了对于AI在科学发现领域能力评测的论文《FrontierScience: Evaluating Al's Ability to Perform Scientific Research Tasks》概述,指出现有评测标准在AI for Science领域失灵。 神同步OpenAI、海外讨论出圈,究竟是什么样的一份工作成 ...
DeepSeek:基于可扩展查找的条件记忆大型语言模型稀疏性的新维度技术,2026报告
Core Insights - The article discusses a new architecture called "Engram" proposed by a research team from Peking University and DeepSeek-AI, which aims to enhance the capabilities of large language models (LLMs) by introducing a complementary dimension of "conditional memory" alongside existing "mixture of experts" (MoE) models [2][3]. Group 1: Model Architecture and Performance - The core argument of the report is that language modeling involves two distinct sub-tasks: combinatorial reasoning and knowledge retrieval, with the latter often being static and local [3]. - The Engram architecture modernizes the N-gram concept into a "conditional memory" mechanism, allowing for direct retrieval of static embeddings with O(1) time complexity, thus freeing up computational resources for higher-order reasoning tasks [3][4]. - A significant finding is the "sparsity distribution law," which indicates that a balanced allocation of approximately 20% to 25% of sparse parameter budgets to the Engram module can significantly reduce validation loss while maintaining computational costs [4]. Group 2: Efficiency and Scalability - The Engram model (Engram-27B) outperformed a baseline MoE model (MoE-27B) in various knowledge-intensive and logic-intensive tasks, demonstrating its effectiveness in enhancing model intelligence [4][5]. - Engram's deterministic retrieval mechanism allows for the unloading of large models into host memory, significantly reducing the dependency on GPU memory and enabling the deployment of ultra-large models with limited hardware resources [6][7]. - The architecture's ability to utilize a multi-level cache structure based on the Zipfian distribution of natural language knowledge can greatly benefit cloud service providers and enterprises aiming to reduce deployment costs [7]. Group 3: Long Context Processing - Engram shows structural advantages in handling long contexts by directly addressing many local dependencies, thus allowing the Transformer model to focus on capturing global long-range dependencies [8]. - In long-text benchmark tests, Engram-27B demonstrated a significant accuracy improvement from 84.2% to 97.0% in multi-query retrieval tasks, indicating enhanced efficiency and optimized attention allocation [8]. Group 4: Future Implications - The research signifies a shift in the design philosophy of large models from merely increasing computational depth to a dual-sparsity approach that incorporates both computation and memory [9]. - The introduction of conditional memory is expected to become a standard configuration for the next generation of sparse models, providing high performance and low-cost solutions for trillion-parameter models [9].
让AI融入游戏剧情和玩法,怎样才能少走弯路?
3 6 Ke· 2026-01-14 12:26
Core Viewpoint - The integration of generative AI in gaming has led to mixed reactions, with many players finding AI-generated dialogues to be dull and lacking creativity, while some industry experts see potential for innovation if used correctly [1][2][4]. Group 1: Current State of AI in Gaming - Generative AI has permeated mainstream gaming, but its implementation has often resulted in poor quality experiences, such as incorrect dialogues and low-quality graphics [1]. - Players have expressed skepticism towards AI-driven NPCs, with some arguing that interacting with a chatbot instead of a well-crafted story is foolish [1][2]. - Experts like Meg Jayanth criticize AI-generated dialogues as "boring" and lacking the depth that human writers provide, emphasizing the importance of human creativity in storytelling [4][5]. Group 2: Potential and Future of AI in Gaming - There is a belief that with careful guidance, generative AI could enhance game narratives and create more immersive experiences [2]. - Some experts suggest that AI could be effectively utilized in new game genres, as seen in games like "1001 Nights" and "Infinite Craft," where AI is central to gameplay rather than just an add-on [8][9]. - Dan Griliopoulos highlights the need for narrative designers to adapt to the evolving landscape of AI, suggesting that AI could be used to enhance storytelling if integrated thoughtfully [11][12]. Group 3: Ethical and Practical Considerations - Concerns about ethical implications, such as privacy risks and the potential for job loss in the industry, are prevalent among experts [5][11]. - Younès Rabii points out that while AI has the potential to generate content, it requires significant investment in training and resources to be effective, which may not be feasible for all developers [15][16]. - Chris Gardiner warns against the over-reliance on AI, arguing that it could lead to a loss of originality and depth in games, which players value [18].
AAAI 2026|AP2O-Coder 让大模型拥有「错题本」,像人类一样按题型高效刷题
机器之心· 2026-01-14 05:37
Core Insights - The article discusses the development of the Adaptive Progressive Preference Optimization (AP2O) method and its framework, AP2O-Coder, aimed at improving code generation and error correction in large language models (LLMs) [3][5][6]. Group 1: Existing Challenges and AP2O-Coder Design - Current offline preference optimization methods face three main challenges: lack of error type awareness, insufficient training focus, and weak dynamic adaptation capabilities [5][12]. - AP2O-Coder is designed to address these challenges by utilizing a systematic learning process similar to human error correction strategies, which includes error analysis and targeted optimization [6][8]. Group 2: AP2O-Coder Framework and Mechanism - The AP2O-Coder framework consists of four key steps: code generation evaluation, error diagnosis analysis, progressive preference optimization, and adaptive error replay [10][11][14]. - The code generation evaluation step establishes an initial training dataset by generating candidate answers for programming tasks and labeling them as pass or fail [10]. - The error diagnosis analysis step uses programming language-specific tools to identify and categorize errors, creating a structured "error book" for targeted optimization [11]. - The progressive preference optimization step focuses on correcting errors in a structured manner, prioritizing error types based on model size [13]. - The adaptive error replay step regularly evaluates model performance and adjusts training data distribution to focus on current weaknesses [14]. Group 3: Experimental Validation and Results - The research team conducted systematic validation on six mainstream LLMs, achieving performance improvements of 2.8% to 3.4% on the EvalPlus benchmark, even for large models [16][18]. - AP2O-Coder demonstrated a significant reduction in error occurrence rates and improved generalization capabilities across various models [22][29]. - The method also showed enhanced sample efficiency, requiring only 4% to 60% of the preference data compared to traditional methods to achieve optimal performance [25]. Group 4: Adaptability of General LLMs - AP2O-Coder is effective not only for code-specific LLMs but also for adapting general LLMs to coding tasks, as evidenced by significant performance improvements in models like Qwen3 and Llama3 [28].
DeepSeek论文披露全新模型机制,SSD等存储需求有望再进一步,龙头还发布炸裂业绩
Xuan Gu Bao· 2026-01-13 23:24
Group 1 - DeepSeek introduced a new paper proposing "conditional memory" as a new dimension of sparsity to optimize large language models through the Engram module [1] - The existing Transformer architecture lacks a native knowledge retrieval mechanism, leading to inefficient simulation of retrieval behavior [1] - Conditional memory complements the MoE (Mixture of Experts) approach and significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] Group 2 - The Engram module is a large, scalable embedding table that acts as an external memory for Transformers, allowing for efficient retrieval of nearby content [2] - Engram caches frequently accessed embeddings in faster storage mediums while storing less frequently accessed data in larger, slower storage, maintaining low access latency [2] - The NAND industry is expected to have limited capital expenditure over the next two years, with leading manufacturers likely to focus on HBM rather than NAND, while AI applications are anticipated to drive SSD demand [2] Group 3 - Baiwei Storage forecasts a net profit of 850 million to 1 billion yuan for the year, representing a year-on-year growth of 427.19% to 520.22% [2] - Jiangbolong has launched several high-speed enterprise-level eSSD products, covering mainstream capacities from 480GB to 7.68TB [3]
桥水 中国市场新动作
Core Insights - Bridgewater Associates is hiring for a "China Policy AI Research Assistant" position, indicating a strategic focus on China and AI integration in macroeconomic research [1][3] - The role aims to enhance understanding of China's policy environment and its impact on assets and the economy, utilizing AI tools for data processing and trend identification [3][4] Group 1: Bridgewater's Strategic Focus - The recruitment signals Bridgewater's preparation to increase its focus on the Chinese market by 2026, amidst growing global macroeconomic uncertainties [3][6] - The Asia Strategy Team at Bridgewater aims to develop leading investment research and strategies to navigate evolving geopolitical and macroeconomic landscapes [3][4] Group 2: AI Integration in Investment Research - The trend of combining subjective research with AI is gaining traction, with Bridgewater exemplifying this shift by establishing an AI lab to leverage machine learning for excess returns [4][5] - The hiring strategy reflects a transformation towards incorporating more data scientists, as stated by Greg Jensen, Co-CIO of Bridgewater [4][5] Group 3: Market Diversification Insights - Bridgewater's analysis highlights the risk of high concentration in U.S. assets, suggesting a shift towards Asian and emerging markets for better diversification [6] - The firm recommends that global equity allocations outside the U.S. should at least match those in U.S. markets, emphasizing the timing for tactical investments in non-U.S. markets [6] Group 4: Positive Outlook on Chinese Assets - Several foreign investment giants express optimism for the performance of Chinese assets in 2026, particularly in the technology sector [7] - There has been a notable inflow of funds into various U.S.-listed Chinese stock ETFs, indicating growing interest from foreign investors [7]
梁文锋署名DeepSeek最新论文,提出新方法突破GPU内存限制
Xin Lang Cai Jing· 2026-01-13 12:33
Core Viewpoint - DeepSeek, a Chinese AI startup, has developed a new model training technique that bypasses GPU memory limitations, enhancing cost efficiency and performance in AI model training [1][3]. Group 1: Technology and Innovation - DeepSeek and researchers from Peking University introduced a "conditional memory" technique called "Engram" to address the limitations of high bandwidth memory (HBM) in scaling AI models [3][4]. - The Engram technology allows for more efficient retrieval of foundational information by decoupling computation from storage, improving the model's performance in handling long contexts [4][6]. - In a model with 27 billion parameters, the new technique improved performance on key industry benchmarks by several percentage points, preserving capacity for complex reasoning tasks [4][6]. Group 2: Competitive Landscape - The HBM gap between China and the US is significant, with Chinese storage chip manufacturers lagging behind their US and South Korean counterparts [4]. - DeepSeek's previous model, DeepSeek-R1, was trained in two months at a cost of $5.5 million, significantly lower than the expenses incurred by US companies like OpenAI, while achieving comparable performance [6][7]. - Microsoft President Brad Smith highlighted that Chinese companies like DeepSeek are rapidly gaining ground in the global AI market, particularly in emerging markets, due to their low-cost open-source models [7]. Group 3: Future Developments - Anticipation is building for DeepSeek's upcoming V4 model, expected to launch in mid-February, which is said to possess strong programming capabilities [7].
王小川,计划再造一个IPO
Di Yi Cai Jing· 2026-01-13 12:31
Core Insights - Baichuan Intelligent aims for an IPO around 2027 and currently has nearly 3 billion yuan in funds available [4] Industry Trends - The AI healthcare sector is experiencing rapid growth, with major players entering the market, including OpenAI and Anthropic [2] - The competition in AI healthcare is intensifying, with significant investments and talent acquisition from companies like Ant Group [2] Company Strategy - Baichuan Intelligent is focusing on the medical sector, particularly pediatrics and oncology, and has established partnerships with Beijing Children's Hospital and the Cancer Hospital of the Chinese Academy of Medical Sciences [3] - The company plans to release two consumer-oriented medical products in the first half of the year, initially offering them for free to build trust and reputation before introducing paid features [3] Product Development - Baichuan Intelligent has launched a new open-source medical language model, Baichuan-M3, which has shown promising results in medical AI evaluations and possesses advanced questioning capabilities [2] - The model aims to enhance medical decision-making by providing patients with comprehensive information and risk assessments, thereby improving healthcare efficiency [3]