Workflow
Seek .(SKLTY)
icon
Search documents
DeepSeek致谢腾讯技术团队:对DeepEP的优化,是一次“huge speedup”代码贡献
Xin Lang Ke Ji· 2025-05-07 11:12
Core Insights - Tencent's technical team has optimized the DeepEP communication framework, achieving significant performance improvements across various network environments, with a 100% performance increase in RoCE networks and a 30% increase in IB networks, enhancing AI large model training efficiency [1][2] Group 1: Technical Enhancements - The optimization involved replacing IBRC with IBGDA and utilizing distinct Queue Pairs (QPs) per channel for parallel data transmission, which improved the robustness and communication performance of the normal kernels [1] - The algorithm bandwidth for the optimized framework reached 58 GB/s in RDMA scenarios, with physical bandwidth calculated at 43.5 GB/s [1] Group 2: Industry Impact - Since the open-sourcing of DeepSeek, including DeepEP, in February, the framework has demonstrated a 300% increase in communication efficiency, addressing the dependency on NVIDIA NCCL for MoE architecture large models [2] - The optimizations have been successfully applied in Tencent's mixed Yuan model projects, showcasing excellent versatility in high-performance environments built with Tencent's Starry Network and H20 servers [2]
你真的会用DeepSeek么?
Sou Hu Cai Jing· 2025-05-07 04:04
Core Insights - The article discusses the transformation in the AI industry, emphasizing the shift from individual AI model usage to a collaborative network of agents, termed as "Agent collaboration network" [8][10][27] - It highlights the urgency for AI professionals to adapt their skills from prompt engineering to organizing and managing AI collaborations, as traditional skills may become obsolete [9][21][30] Group 1: Industry Trends - The AI landscape is evolving towards a multi-agent system where agents communicate and collaborate autonomously, moving away from reliance on human prompts [27][14] - The emergence of protocols like MCP (Multi-agent Communication Protocol) and A2A (Agent-to-Agent) is facilitating this transition, allowing for standardized communication between different AI systems [36][37] - Major companies like Alibaba, Tencent, and ByteDance are rapidly developing platforms that support these new protocols, enabling easier integration and deployment of AI agents [38][39] Group 2: Skills Transformation - AI professionals need to transition from being prompt engineers to "intent architects," focusing on defining task languages and collaboration protocols for agents [29][30] - The role of AI practitioners is shifting from using agents to organizing and managing multiple agents, requiring a new mindset akin to building a digital team [30][31] - There is a call for professionals to learn about agent frameworks, communication protocols, and how to register their tools as agent capabilities within larger networks [33][34] Group 3: Practical Applications - Various platforms and frameworks are emerging that allow AI professionals to practice and implement these new skills, such as LangGraph, AutoGen, and CrewAI [41] - The article emphasizes that the infrastructure for agent protocols is being established, providing opportunities for AI professionals to engage with these technologies [41][42] - The ongoing development of these systems is likened to the early days of TCP/IP, suggesting that those who adapt early will have a competitive advantage in the evolving AI landscape [42]
DeepSeek:“边缘革命” 的可能性
3 6 Ke· 2025-05-07 02:34
Core Insights - DeepSeek, a Chinese tech company focused on general artificial intelligence, has gained significant attention in the global AI landscape with its open-source inference model that is available for free commercial use, supporting specific development and application scenarios [1][2] - The success of DeepSeek highlights the potential for a "periphery revolution," where emerging players can disrupt established dominance in the AI sector, particularly in the context of developing countries gaining access to AI technologies [2][3] - DeepSeek's operational model serves as a case study for the construction and enhancement of AI platforms in China, indicating that mastery of foundational technologies does not guarantee control over the value distribution in the network industry [3][4] Summary by Categories Product and Innovation - DeepSeek's open-source inference model allows for free commercial use and supports complex tasks such as text generation, natural language understanding, and programming, showcasing strong application design and secondary development features [1] - The company's success is seen as a catalyst for the adoption of open-source AI models, marking a significant moment in the AI industry [1][2] Market Dynamics - DeepSeek's emergence suggests a narrowing gap between China and the U.S. in AI capabilities, particularly following the release of its V3 models, which have accelerated the pace of innovation in the sector [3][4] - The shift towards free or low-cost AI services is expected to drive rapid industrial application, as many large model service providers have transitioned to free pricing models [4] Industry Implications - The rise of small teams like DeepSeek demonstrates that significant innovation can come from smaller entities, challenging the notion that only large companies with substantial resources can lead in AI development [4] - The need for a well-designed policy framework for domestic and international industrial cycles is emphasized, ensuring that technological advancements align with national interests while avoiding past pitfalls of reckless competition [5][6] Education and Knowledge Dissemination - The advent of large models necessitates a fundamental transformation in education, shifting focus from rote memorization to innovation and practical application, as these models serve as powerful knowledge aids [7][8] - The concept of "open knowledge" is highlighted, where access to cutting-edge information is democratized through large models, enabling individuals to learn and innovate more rapidly [9][10]
DeepSeek等大模型工具使用手册(实战篇)-厦大团队
Sou Hu Cai Jing· 2025-05-06 14:37
该报告由厦门大学大数据教学团队编制,聚焦AIGC技术及DeepSeek等大模型工具,涵盖理论知识与实践案例,为读者提供全面的AIGC技术使用指南。 1. AIGC技术基础 核心概念:AIGC即人工智能生成内容,运用生成对抗网络、大型预训练模型和多模态技术,根据输入生成各类内容,在多领域应用广泛。它与大模型相互 促进,大模型提供技术支撑,AIGC推动大模型发展。 发展历程:历经早期萌芽、沉淀累积和快速发展阶段,如今已广泛采用大模型技术,在各行业发挥重要作用,改变内容创作方式,提升生产力,创造新兴职 业,推动行业转型。 2. 文本类AIGC应用 应用场景:在新闻、广告、文学创作等多领域广泛应用,如自动生成新闻报道、个性化广告文案等。 工具使用:以DeepSeek和百度文心一言为例。DeepSeek使用时应简单直接提问,可多轮对话优化结果,还可用"魔法"指令提升效率;文心一言则需明确风 格、结构、角色、内容、文体等要求,分步引导可解决复杂问题。此外,还介绍了讯飞智文、DeepSeek+Kimi生成PPT,以及DeepSeek+xmind制作思维导图 的方法。 3. 图片类AIGC应用:涵盖创意图片生成、AI修图、图 ...
国内AI应用市场Q1剧变:DeepSeek登顶,腾讯元宝狂飙,Kimi失速,“场景为王”时代开启|2025年一季度AI应用价值榜
Mei Ri Jing Ji Xin Wen· 2025-05-06 11:29
Core Insights - The domestic AI application market underwent a significant reshuffle in Q1 2025, shifting from a "arms race" focused on model parameters to a competitive landscape centered around "application ecosystems" [3] - The report highlights that as underlying model capabilities become homogenized, the key to growth lies in deeply integrating AI capabilities with specific scenarios, supported by effective commercialization and marketing strategies [3] Group 1: Market Leaders and Trends - DeepSeek emerged as a dominant player with an impressive average monthly download user count of 81.13 million and nearly 187 million monthly active users (MAU), indicating a strong user base [17] - Tencent Yuanbao showed remarkable growth, with a monthly download count of 13.43 million, a nearly 1500% increase, and an MAU of 23.58 million, reflecting aggressive marketing and user acquisition strategies [20] - Doubao maintained a solid second position with a monthly download count of 27.24 million and an MAU of 99.81 million, although its growth rate has slowed compared to competitors [21] Group 2: Competitive Dynamics - Kimi, once a strong contender, faced a decline with a monthly download count of 8.34 million, down 3.9%, and an MAU of 21.65 million, indicating significant growth pressure [24] - The general AI assistant market is becoming saturated, with many established players like Baidu Wenxiaoyan and iFlytek experiencing declines in both downloads and MAU [27] - In contrast, specialized AI applications, such as "Nano AI Search" and "Lovekey," have shown strong growth, indicating a shift towards scenario-based applications [32] Group 3: Future Outlook - The report suggests that the market is witnessing a "Matthew Effect," where top players like DeepSeek and Doubao dominate, capturing nearly 90% of the total MAU among the top 20 applications [34] - Capital and marketing remain crucial drivers of growth, as evidenced by Tencent Yuanbao's success, while Kimi's experience highlights the unsustainability of purely financing-driven growth strategies [37] - The future of AI applications will focus on solving specific pain points and providing unique value through "AI + scenario" applications, moving away from generic tools and emotional companionship [38]
八旬院士“神预言”DeepSeek诞生!“真没料到会成预言家”
Huan Qiu Wang Zi Xun· 2025-05-06 09:33
Core Insights - Chen Runsheng is a pioneer in non-coding gene research and a participant in the Human Genome Project, which is one of the largest life science projects globally [1][2] - He emphasizes that the future of AI in China lies not in the quantity of chips but in the density of intelligent computing [1] Group 1: Contributions to Genomics - Under Chen's leadership, China became the sixth country globally to possess large-scale genome sequencing capabilities [2][6] - In 1999, China joined the International Human Genome Project, taking on the task of sequencing approximately 30 million base pairs of the human chromosome 3 short arm, which represented 1% of the entire project [6] - Chen's team innovated sequencing methods, completing their tasks two years ahead of schedule, demonstrating significant advancements in genomic research [6] Group 2: Discoveries in Non-Coding DNA - Chen discovered that only 2%-3% of the human genome encodes proteins, while 97% consists of non-coding sequences previously deemed "junk DNA" [6][7] - His team focused on these non-coding regions, leading to the identification of new disease-related loci, particularly in cancer research [7] Group 3: Open Science and Collaboration - Since 1993, Chen's team has established a comprehensive database of 640,000 non-coding molecular information, which they chose to share openly with the global scientific community [7] - Chen believes that science is a collective human contribution and emphasizes the importance of sharing research findings for the advancement of knowledge [7] Group 4: AI and Future Innovations - Chen has been involved in AI research since the late 1980s, applying artificial neural networks to predict coding genes [8] - His current work involves integrating traditional Chinese medicine data into AI models, aiming to create a platform that merges different medical perspectives [8] - He advocates for viewing AI not merely as a tool but as a new center for innovation, which could lead to more creative possibilities in research and development [8]
AI人工智能ETF(512930)、消费电子ETF(561600)冲击3连涨,线上消费ETF基金(159793)涨近3%,DeepSeek发布Prover-V2模型
Xin Lang Cai Jing· 2025-05-06 02:28
Group 1: AI Industry Developments - The AI Artificial Intelligence ETF (512930) has seen a strong performance, rising 1.53% recently, marking its third consecutive increase, with a latest price of 1.33 yuan [3] - DeepSeek released a new model, DeepSeek-Prover-V2-671B, which features 671 billion parameters and utilizes a more efficient safetensors file format, enhancing training and deployment efficiency [4] - Alibaba has open-sourced its new model Qwen3, which reduces parameter count by two-thirds compared to its predecessor while outperforming major models like DeepSeek-R1 and OpenAI-o1, indicating a significant advancement in AI capabilities [4] Group 2: Market Performance and Trends - The AI Artificial Intelligence ETF has a recent trading volume of 39.56 million yuan, with a turnover rate of 2.1%, and a one-month average trading volume of 93.04 million yuan [3] - The consumption electronics sector is also performing well, with the Consumption Electronics ETF (561600) rising 1.15%, and a recent price of 0.79 yuan, reflecting a 1.16% increase over the past two weeks [7] - The Online Consumption ETF (159793) has increased by 2.60%, with a latest price of 0.91 yuan, and a two-week cumulative increase of 2.08% [9] Group 3: Index Composition and Weighting - The top ten weighted stocks in the AI Artificial Intelligence Theme Index (930713) account for 49.82% of the index, with notable companies including Cambricon (688256) and Hikvision (002415) [10] - The top ten stocks in the Consumption Electronics Theme Index (931494) represent 53.05% of the index, featuring companies like Luxshare Precision (002475) and SMIC (688981) [18] - The Online Consumption Theme Index (931481) has its top ten weighted stocks accounting for 57.55%, including major players like Alibaba-W (09988) and Tencent Holdings (00700) [15]
李彦宏说 DeepSeek 幻觉高,是真的吗?
3 6 Ke· 2025-05-02 04:29
Core Insights - The article discusses the hallucination problem in large language models (LLMs), particularly focusing on DeepSeek-R1, which has a high hallucination rate compared to its predecessor and other models [2][6][13] - Li Yanhong criticizes DeepSeek-R1 for its limitations, including high hallucination rates, slow performance, and high costs, sparking discussions about the broader issues of hallucinations in AI models [2][6][19] - The hallucination phenomenon is not unique to DeepSeek, as other models like OpenAI's o3/o4-mini and Alibaba's Qwen3 also exhibit significant hallucination issues [3][8][13] Summary by Sections Hallucination Rates - DeepSeek-R1 has a hallucination rate of 14.3%, significantly higher than DeepSeek-V3's 3.9%, indicating a fourfold increase in hallucination [6][7] - Other models, such as Qwen-QwQ-32B-Preview, show even higher hallucination rates at 16.1% [6][7] - OpenAI's o3 model has a hallucination rate of 33%, nearly double that of its predecessor o1, while the lightweight o4-mini model reaches 48% [8][10] Industry Response - The AI industry is grappling with the persistent issue of hallucinations, which complicates the development of more advanced models [13][19] - Companies are exploring various methods to mitigate hallucinations, including retrieval-augmented generation (RAG) and strict data quality control [20][22][23] - Despite advancements in certain areas, such as multimodal outputs, hallucinations remain a significant challenge in generating long texts or complex visual scenarios [18][19] Implications of Hallucinations - Hallucinations are increasingly seen as a common trait among advanced models, raising questions about their reliability and user trust, especially in professional or high-stakes contexts [17][27] - The phenomenon of hallucinations may also contribute to creativity in AI, as they can lead to unexpected and imaginative outputs [24][26] - The acceptance of hallucinations as an inherent characteristic of AI models suggests a need for a paradigm shift in how AI is perceived and utilized [27]
宝马中国宣布接入DeepSeek,宝马妥协了?
3 6 Ke· 2025-05-02 02:21
Core Viewpoint - BMW China is embracing local AI technology by integrating DeepSeek, marking a significant step in its digital transformation strategy and enhancing its AI capabilities in the Chinese market [1][3][6] Group 1: BMW's AI Integration - BMW has announced the integration of DeepSeek into its operations, which will enhance the BMW Intelligent Personal Assistant and improve human-machine interaction in new models starting from Q3 2025 [1][2] - The collaboration with DeepSeek follows BMW's earlier partnership with Alibaba to develop AI language models, showcasing BMW's commitment to local AI ecosystem development [1][3] Group 2: Strategic Importance of Local AI - This move signifies BMW's recognition of the importance of local AI technologies and its willingness to adapt to the rapidly evolving Chinese automotive market [3][4] - BMW's previous initiatives, such as the launch of a 360-degree AI strategy and the development of intelligent systems like "Car Expert" and "Travel Companion," reflect its ongoing efforts to enhance its smart vehicle offerings [3][4] Group 3: Challenges and Opportunities - Despite its historical strengths in manufacturing and brand image, BMW faces challenges in keeping pace with the increasing demand for smart and connected vehicles [4][5] - The partnership with DeepSeek is seen as a strategic decision to accelerate BMW's digital transformation and leverage the advanced technologies and innovative models from Chinese tech companies [4][6]
DeepSeek开源新模型,数学推理能力大提升
Hu Xiu· 2025-05-01 00:48
Core Insights - DeepSeek has officially released DeepSeek-Prover-V2 on Hugging Face, continuing its open-source momentum with two versions launched [1][4] - The training core of DeepSeek-Prover-V2 combines "recursion + reinforcement learning," enabling the model to break down complex theorems into sub-goals and reasoning paths [3][8] Model Specifications - DeepSeek-Prover-V2-7B is based on the previous V1.5 model and supports a maximum context input of 32K [4] - DeepSeek-Prover-V2-671B is trained on the DeepSeek-V3-Base, showcasing the strongest reasoning performance [4] Training Process - The training process consists of two phases: the first phase focuses on rapid mode using an "expert iteration" method, where successful answers refine the model [5] - In the second phase, more complex logical reasoning capabilities are trained, incorporating mathematical knowledge from DeepSeek-V3 and formal data [6] Reinforcement Learning - The GRPO reinforcement learning algorithm is introduced to enhance reasoning capabilities, allowing the model to autonomously learn to select optimal solutions from multiple candidates [8] - The system generates 32 different proof schemes for each theorem, retaining only those verified as correct by the Lean verification system [9] Model Distillation - After developing the powerful 671B model, the team distilled its capabilities into a smaller 7B model, allowing users to achieve near-equivalent mathematical reasoning abilities on resource-limited devices [10][11] Reasoning Modes - The rapid mode (non-CoT) focuses on speed, generating concise Lean code answers without showing the thought process, suitable for handling numerous problems [12] - The logical mode (CoT) details each step of the reasoning process, ensuring clarity and transparency [12] Performance Evaluation - In the final performance assessment, DeepSeek-Prover-V2-671B achieved an 88.9% pass rate in the MiniF2F test, successfully solving 49 problems from the PutnamBench dataset [17] New Dataset - DeepSeek introduced a new formal mathematical dataset, ProverBench, containing 325 problems across various mathematical domains, including number theory, algebra, and calculus [18][19] Comparison and Trends - The comparison shows a significant trend: the performance gap between large language models in "informal mathematical reasoning" and "formal mathematical reasoning" is narrowing [21] - The evolution of model structure and training strategies enables models to produce rigorous, verifiable mathematical proofs [22] Future Directions - DeepSeek-Prover-V2 indicates a shift in focus from merely generating content to generating structured logic, which may touch upon the foundational structure of general artificial intelligence [33][34]