大语言模型
Search documents
重磅!DeepSeek 梁文锋论文登上《自然》封面,正面回应蒸馏质疑
程序员的那些事· 2025-09-20 01:10
9 月 18 日,由 DeepSeek 团队共同完成、梁文锋担任通讯作者的 DeepSeek-R1 推理模型研究论文,登上了国际权威期刊《自然(Nature)》的封面。 与今年 1 月发布的 DeepSeek-R1 的初版论文相比,本次论文披露了更多模型训练的细节,并正面回应了模型发布之初的蒸馏质疑。 DeepSeek-R1 是全球首个经过同行评审的主流大语言模型。目前几乎所有主流的大模型都还没有经过独立同行评审,这一空白"终于被 DeepSeek 打 破"。 在《自然》封面的推荐介绍中,是这样写的: "如果训练出的大模型能够规划解决问题所需的步骤,那么它们往往能够更好地解决问题。这种『推理』与人类处理更复杂问题的方式类似,但这对人工 智能有极大挑战,需要人工干预来添加标签和注释。在本周的期刊中,DeepSeek 的研究人员揭示了他们如何能够在极少的人工输入下训练一个模型,并 使其进行推理。 DeepSeek-R1 模型采用强化学习进行训练。在这种学习中,模型正确解答数学问题时会获得高分奖励,答错则会受到惩罚。结果,它学会了推理——逐 步解决问题并揭示这些步骤——更有可能得出正确答案。这使得 DeepSeek ...
DeepSeek团队梁文锋论文登上《自然》封面
Zheng Quan Shi Bao Wang· 2025-09-19 04:46
Core Viewpoint - The research paper on the DeepSeek-R1 reasoning model, led by Liang Wenfeng, demonstrates that the reasoning capabilities of large language models (LLMs) can be enhanced through pure reinforcement learning, reducing the need for human input in performance improvement [1] Group 1 - The study indicates that LLMs do not need to rely on human examples or complex instructions, as they can autonomously learn to generate reasoning processes through trial-and-error reinforcement learning [1] - The AI exhibits self-reflection, which is considered a significant indication of artificial intelligence exploring cognitive pathways beyond human thinking [1]
GPT-4o学习“波多野结衣”的次数,比“您好”还多2.6倍
猿大侠· 2025-09-19 04:11
Core Viewpoint - The article discusses the contamination of language models, particularly GPT, by inappropriate content, highlighting the prevalence of certain terms related to adult entertainment in the training data [4][10]. Group 1: Research Findings - Researchers from Tsinghua University and Nanyang Technological University identified that popular language models like ChatGPT are contaminated by certain "PoC Tokens," which are defined as "polluted Chinese tokens" [6][4]. - In the long Chinese tokens of GPT, over 23% are associated with gray content such as pornography or gambling, indicating a significant level of contamination in the model's vocabulary [7][8]. - The study quantifies that content related to the adult film star "波多野结衣" constitutes approximately 0.5% of the training data for GPT-4o, which is 2.6 times more frequent than the common greeting "你好" [10]. Group 2: Implications and Concerns - The presence of PoC Tokens poses a risk to AI, as these elements can become ingrained in the AI's knowledge base, potentially leading to nonsensical or irrelevant responses [10]. - The widespread existence of these tokens reflects serious challenges in the quality of Chinese web corpus used for training large language models (LLMs) [13]. - The article suggests that the current state of AI training data may inadvertently promote inappropriate content, raising concerns about the implications for AI development and deployment [13].
中国服务业企业500强发布,华为公布AI芯片发展路线 | 财经日日评
吴晓波频道· 2025-09-19 00:30
Group 1: Federal Reserve and Economic Policy - The Federal Reserve announced a 25 basis point rate cut, lowering the target range from 4.25%-4.5% to 4.00%-4.25%, marking the first rate cut of the year after a total reduction of 125 basis points since last September [2][3] - The Fed's statement highlighted a slowdown in job growth and a slight increase in the unemployment rate, indicating a cautious approach to future rate cuts amid rising inflation [2][3] - Fed Chair Powell faces a challenging decision between maintaining higher rates to curb inflation or cutting rates to support the job market, with the current economic indicators suggesting a need for preventive measures [2][3] Group 2: Immigration and Service Industry Growth - From January to August, the number of visa-free foreign entrants to China increased by 52.1% year-on-year, with a total of 15.89 million foreign visitors [4][5] - The Chinese government is optimizing visa policies to attract more foreign visitors, which is expected to stimulate consumption and boost the service industry [4][5] - The 2025 China Service Industry Top 500 report revealed a total revenue of 51.1 trillion yuan, with an average revenue per company exceeding 1 billion yuan, indicating strong growth in the service sector [6][7] Group 3: AI Chip Development - Huawei announced a three-year roadmap for its Ascend AI chip series, with plans to release four new chips between 2026 and 2028, emphasizing the use of self-developed high-bandwidth memory [8][9] - The development of AI chips is seen as a strategic move to reduce reliance on foreign technology, with other Chinese companies like Alibaba and Baidu also accelerating their AI chip research [8][9] - The DeepSeek team's research on a new language model was published in Nature, showcasing advancements in AI training methodologies and contributing to the global AI landscape [10][11] Group 4: International Market Expansion - Didi and Meituan are investing heavily in the Brazilian food delivery market, with Didi planning to invest 2 billion reais and Meituan committing 1 billion USD over five years [12][13] - The competitive landscape in Brazil's food delivery market is intensifying, with both companies facing challenges from local giants like iFood [12][13] - The entry of Chinese companies into the Brazilian market reflects a broader strategy to capture opportunities in Latin America, despite the challenges of local competition [12][13] Group 5: Digital Asset Regulation - The SEC has simplified the approval process for digital asset ETFs, reducing the timeline from 240 days to a maximum of 75 days, signaling a shift towards a more favorable regulatory environment for digital assets [14][15] - This regulatory change aims to promote innovation while maintaining oversight, as the U.S. seeks to catch up with other financial hubs that have embraced digital currencies [14][15] - The SEC's decision reflects a broader trend of increasing acceptance of digital assets within the U.S. financial system, potentially reshaping the competitive landscape for digital asset products [14][15]
远程银行的“跨越山海”与咫尺服务
Zheng Quan Ri Bao· 2025-09-18 16:22
"我们观察到多家银行的AI业务已从'试试看'转为'必须做',其整体战略布局已被重构。远程银行不仅是 银行数字化转型成果的集中展现,更是其关键输出端口。它不再是成本中心,而是新的服务核心、营销 中心和价值创造中心。"蚂蚁数科副总裁余滨在接受《证券日报》记者采访时分享了他的见解。 从业者的直观感受,正是当前银行业数字化转型深入推进的真实写照。在数智驱动下,金融服务提质升 级,有力推动了银行跑出金融为民的"加速度"。作为数字化转型的"桥头堡",远程银行由信用卡中心、 电话银行中心、网络银行部等传统部门整合而成,形成独立的"远程银行部"或"线上客户经营中心",并 提升至与线下网点同等重要的战略层级,成为银行全面数字化转型的重要支点。 随着"数字中国"建设及做好"数字金融"大文章的深入推进,以创新为核心的新质生产力正迅速崛起,成 为推动金融高质量发展的核心动力。在新形势下,银行与用户的关系正在重塑,服务渠道与工具也在不 断更新迭代,一幅"新金融"的蓝图正徐徐展开。 从功能叠加 走向业务重构 余滨长期深耕在业务一线,致力于服务机构的远程银行建设。他向记者讲述:"如今,我们为银行提供 的AI应用已从最初的智能客服、知识问答 ...
当前的自动驾驶VLA,还有很多模块需要优化...
自动驾驶之心· 2025-09-18 11:00
Core Viewpoint - VLA (Vision-Language-Action) is emerging as a mainstream keyword in autonomous driving, with rapid advancements in both academia and industry, aiming to overcome the limitations of traditional modular architectures and enhance the capabilities of autonomous systems [1][5]. Summary by Sections VLA Research and Development - The transition from traditional modular architectures to end-to-end models is marked by the introduction of VLA, which aims to unify sensor inputs directly into driving commands, addressing previous bottlenecks in the development of autonomous driving systems [2][5]. - The VLA model leverages large language models (LLMs) to enhance reasoning, explanation, and interaction capabilities, making it a significant advancement in the field [5]. Traditional Modular Architecture - Early autonomous driving systems (L2-L4) utilized a modular design, where each module (e.g., object detection, trajectory prediction) was developed independently, leading to issues such as error accumulation and information loss [3]. - The limitations of traditional architectures include reliance on manually designed rules, making it difficult to handle complex traffic scenarios [3][4]. Emergence of Pure Vision End-to-End Models - The rise of pure vision end-to-end models, exemplified by NVIDIA's DAVE-2 and Wayve, aimed to simplify system architecture through imitation learning, but faced challenges related to transparency and generalization in unseen scenarios [4][5]. VLA Paradigm - The VLA paradigm introduces a new approach where language serves as a bridge between perception and action, enhancing the model's interpretability and trustworthiness [5]. - VLA models can utilize pre-trained knowledge from LLMs to better understand complex traffic situations and make logical decisions, improving generalization to novel scenarios [5]. Course Objectives and Structure - The course aims to provide a systematic understanding of VLA, addressing gaps in knowledge and practical skills, and includes a comprehensive curriculum covering various aspects of VLA research [6][12]. - The program consists of 12 weeks of online group research, followed by 2 weeks of paper guidance, and an additional 10 weeks for paper maintenance, focusing on both theoretical and practical applications [7][30]. Enrollment and Requirements - The course is designed for individuals with a background in deep learning and basic knowledge of autonomous driving algorithms, requiring familiarity with Python and PyTorch [16][19]. - The class size is limited to 6-8 participants to ensure personalized attention and effective learning [11]. Course Highlights - Participants will gain insights into classic and cutting-edge papers, coding skills, and methodologies for writing and submitting research papers, enhancing their academic and professional profiles [12][15][30].
DeepSeek 首登《自然》封面:中国大模型创造新历史,做了 OpenAI 不敢做的事
3 6 Ke· 2025-09-18 09:56
Core Insights - DeepSeek's AI model, R1, has gained significant recognition by being featured on the cover of Nature, a prestigious scientific journal, highlighting its impact in the AI industry [2][10][12] - The training cost for R1 was notably low at $294,000, which contrasts sharply with the multi-million dollar investments typical for models from companies like OpenAI [7][48] - The model's development process involved rigorous peer review, setting a new standard for transparency and scientific validation in AI [11][15][16] Group 1: Model Development and Training - DeepSeek R1's training process was detailed in a paper published on arXiv, which was later expanded upon in the Nature article, showcasing a comprehensive methodology [6][7] - The model was trained using a pure reinforcement learning framework, allowing it to develop reasoning capabilities without relying on human-annotated data [19][41] - R1 achieved an impressive accuracy of 77.9% in the AIME 2024 math competition, surpassing human average scores and even outperforming GPT-4 in certain tasks [23][31] Group 2: Peer Review and Industry Impact - The peer review process for R1 involved independent experts scrutinizing the model, which is a departure from the typical practices of major AI companies that often do not submit their models for academic evaluation [10][11][15] - Nature's editorial team has called for other companies to submit their models for peer review, emphasizing the importance of transparency and accountability in AI development [15][16] - The recognition from Nature not only validates R1's scientific contributions but also positions DeepSeek as a leader in the push for more rigorous standards in AI research [12][50] Group 3: Technical Innovations - R1's architecture is based on a mixture of experts (MoE) model with 671 billion parameters, which was pre-trained on a vast dataset of web pages and e-books [25] - The model's training involved a unique approach where it was rewarded solely based on the correctness of its answers, fostering an environment for self-reflection and dynamic adjustment during problem-solving [29][38] - The final version of R1 was developed through a multi-stage training process that combined reinforcement learning with supervised fine-tuning, enhancing both reasoning and general capabilities [39][47]
DeepSeek,严正声明!
Zhong Guo Ji Jin Bao· 2025-09-18 08:37
Core Viewpoint - DeepSeek has issued a statement regarding fraudulent activities where criminals impersonate the company or its employees to scam users, severely harming user rights and the company's reputation [1][2]. Group 1: Fraudulent Activities - Criminals have been using forged materials to solicit payments from users under the guise of "computing power leasing" and "equity financing" [1]. - DeepSeek emphasizes that it has never requested users to make payments to personal or unofficial accounts, and any such requests are fraudulent [2]. - The company warns users to verify information through its official website and certified accounts, as all official services are currently free [2]. Group 2: Company Background - DeepSeek was established in 2023 and is incubated by the well-known quantitative investment firm, Huansheng Quantitative [3]. - The founding team is led by quantitative expert Liang Wenfeng and includes top research talents from prestigious universities and experienced technical experts from international institutions [3]. - Recently, DeepSeek's research paper, DeepSeek-R1, was published on the cover of the prestigious journal Nature, marking it as the first major language model to undergo peer review [3].
从 ChatGPT 到 Marble,李飞飞押注的下一个爆发点是 3D 世界生成?
锦秋集· 2025-09-18 07:33
Core Viewpoint - The article discusses the launch of World Labs' latest spatial intelligence model, Marble, which allows users to generate persistent and navigable 3D worlds from images or text prompts, marking a significant advancement in spatial intelligence technology [1][2]. Summary by Sections Marble's Features and Comparison - Marble shows significant improvements over similar products in geometric consistency, style diversity, world scale, and cross-device support, allowing users to truly "walk into" AI-generated spaces [2]. Li Feifei's Vision and World Model Narrative - Li Feifei's approach emphasizes a transition from language understanding to world understanding, culminating in spatial intelligence as a pathway to AGI (Artificial General Intelligence) [3][6]. Limitations of LLMs - While acknowledging the achievements of large language models (LLMs), Li Feifei highlights their limitations in understanding the three-dimensional world, asserting that true intelligence requires spatial awareness [5][7]. The Necessity of Spatial Intelligence for AGI - Spatial intelligence is deemed essential for AGI, as the real world is inherently three-dimensional, and understanding it requires more than just two-dimensional observations [16]. Evolution of AI Learning Paradigms - The article outlines three phases of AI learning evolution: supervised learning, generative modeling, and the current focus on three-dimensional world models, emphasizing the importance of data, computation, and algorithms [21][24]. Data Strategy for World Models - A mixed approach to data collection is necessary for training world models, combining real data acquisition, reconstruction, and simulation to overcome the scarcity of high-quality three-dimensional data [26]. Practical Applications and Development Path - The initial focus for Marble's application is on content production, transitioning to robotics and AR/VR, with an emphasis on creating interactive 3D worlds for various industries [29][30].
DeepSeek打破历史!中国AI的“Nature时刻”
Zheng Quan Shi Bao· 2025-09-18 07:29
Core Insights - The DeepSeek-R1 inference model research paper has made history by being the first Chinese large model research to be published in the prestigious journal Nature, marking a significant recognition of China's AI technology on the global scientific stage [1][2] - Nature's editorial highlighted that DeepSeek has broken the gap of independent peer review for mainstream large models, which has been lacking in the industry [2] Group 1: Research and Development - The DeepSeek-R1 model's research paper underwent a rigorous peer review process involving eight external experts over six months, emphasizing the importance of transparency and reproducibility in AI model development [2] - The paper disclosed significant details about the training costs and methodologies, including a total training cost of $294,000 (approximately 2.09 million RMB) for R1, achieved using 512 H800 GPUs [3] Group 2: Model Performance and Criticism - DeepSeek addressed initial criticisms regarding the "distillation" method used in R1, clarifying that all training data was sourced from the internet without intentional use of outputs from proprietary models like OpenAI's [3] - The R1 model's training duration was 198 hours for R1-Zero and 80 hours for R1, showcasing a cost-effective approach compared to other models that often exceed tens of millions of dollars [3] Group 3: Future Developments - There is significant anticipation regarding the release of the R2 model, with speculation that delays may be due to computational limitations [4] - The recent release of DeepSeek-V3.1 indicates advancements towards the "Agent" era, featuring a mixed inference architecture and improved efficiency, which has sparked interest in the upcoming R2 model [4][5] Group 4: Industry Impact - DeepSeek's adoption of UE8M0 FP8 Scale parameter precision in V3.1 suggests a shift towards utilizing domestic AI chips, potentially accelerating the development of China's computing ecosystem [5] - The collaboration between software and hardware in DeepSeek's models is seen as a new paradigm in the AI wave, with expectations for significant performance improvements in domestic computing chips [5]