Large Language Model

Search documents
MinMax-M1:超越DeepSeek,支持百万级token上下文
自动驾驶之心· 2025-06-21 13:15
以下文章来源于AIGC面面观 ,作者欠阿贝尔两块钱 AIGC面面观 . 整理LLM、AIGC的入门笔记 | 论文学习笔记 | 一线大厂面经 | 探索AIGC落地 作者 | 欠阿贝尔两块钱 来源 | AIGC面面观 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>点击进入→ 自动驾驶之心 『大模型』技术交流群 主要贡献 1. 高效混合架构设计 :结合MoE架构与Lightning Attention)的模型MiniMax-M1, 支持百万级上下文窗 口(1M tokens) ,生成长度达80K tokens时FLOPs仅为传统注意力模型的25%。 2. 超越DAPO的算法CISPO :通过 剪裁重要性采样权重 提升RL效率,相比DAPO实现2倍加速,避免了 传统方法(如PPO/GRPO)对低概率token有更好的采样效果。 3. 可扩展上下文 :支持从40K到80K Token生成长度的扩展。 本文只做学术分享,如有侵权,联系删文 1.混合注意力架构 Lighting Attention : 采用I/O感知的线性注意力计算,通过分块计算和内存优化 ,将长 ...
Sam Altman Says Meta Offered OpenAI Staffers $100 Million Bonuses
Bloomberg Television· 2025-06-18 08:04
So the figures are astronomical and there is a way to make sense of this stuff. And I'm going to use the acronym c, T Chips Data Talent. So those are the three pillars of AI development.And basically, you can kind of make sense of what all of these different AI firms are doing based on where they have weaknesses in those areas. So if you think about that, they have got the chips, they most definitely have got the data, but it's the talent where it's seeming like they're needing to pick up pace a little bit. ...
1200行代码逆袭!DeepSeek工程师开源轻量级vLLM,吞吐量逼近原版
机器之心· 2025-06-13 04:31
Core Viewpoint - vLLM is a high-performance, open-source LLM inference and service engine developed by the University of California, Berkeley, aimed at enhancing inference speed and resource utilization, particularly memory efficiency, while being compatible with popular model libraries like Hugging Face [2][3]. Group 1: vLLM and Nano-vLLM - vLLM enables mainstream models like GPT, Mistral, and LLaMA to run faster and consume fewer resources through its innovative attention mechanism called PagedAttention [3]. - A lightweight implementation of vLLM, named Nano-vLLM, was developed by DeepSeek AI researcher Yu Xingkai, simplifying the code to under 1200 lines [4][7]. - Nano-vLLM has gained over 200 stars on GitHub, indicating community interest and engagement [5]. Group 2: Features of Nano-vLLM - Nano-vLLM offers three core functionalities: 1. Fast offline inference with performance comparable to vLLM [6]. 2. A readable codebase with a simplified implementation [7]. 3. An optimization suite that includes features like prefix caching, Torch compilation, and CUDA computation graphs [8]. Group 3: Benchmarking Results - Benchmark tests showed that Nano-vLLM produced the same output tokens as vLLM but took slightly longer, resulting in a throughput of 1314.65 tokens/s compared to vLLM's 1353.86 tokens/s [9][11]. - The testing configuration included using an RTX 4070 GPU, with a model size of Qwen3-0.6B, and random sampling of input and output lengths between 100 and 1024 tokens [10].
大模型能够自发形成“人类思维地图”!Nature子刊重磅研究揭示多模态大模型类脑机制
机器人圈· 2025-06-11 11:43
Core Viewpoint - The research published in "Nature Machine Intelligence" demonstrates that multimodal large language models (MLLMs) can develop human-like object concept representations, challenging the notion that these models merely mimic human language without true understanding [2][4]. Group 1: Research Findings - The study analyzed 4.7 million behavioral judgment data to construct an "concept map" of AI models, confirming that MLLMs can form object concept representations similar to humans [3][6]. - The research identified 66 core dimensions of cognition through a sparse positive definite similarity embedding method, revealing that both ChatGPT-3.5 and the multimodal Gemini model exhibit stable low-dimensional representation structures [9]. - MLLMs spontaneously formed 18 high-level object concept categories with a classification accuracy of 78.3%, approaching human accuracy of 87.1% [13]. Group 2: Methodology - The research employed a novel "behavioral cognitive probe" method, integrating computational modeling, behavioral experiments, and neuroscience to analyze AI cognition [8]. - A triplet odd-one-out task was designed to assess the similarity of object representations between AI and humans, allowing for a comparative analysis of decision-making processes [5][31]. Group 3: Cognitive Dimensions - The study provided semantic labels for the cognitive dimensions of AI models, categorizing them into dimensions related to semantic categories, perceptual features, and physical components [17][19][20]. - The findings indicated a significant correlation between MLLM representations and human brain activity patterns, particularly in areas responsible for processing faces, scenes, and bodies [23][24]. Group 4: Implications and Future Directions - The research has broad applications, including the development of neuro-aligned AI systems, exploration of neural mechanisms for concept combination and reasoning, and enhancement of brain-computer interface systems [35]. - Future work will focus on expanding to next-generation multimodal models and establishing a cognitive benchmark testing platform to objectively assess AI's semantic understanding [35][36].
WWDC25: Introducing the Foundation Models framework
Apple Developer· 2025-06-10 23:01
Core Functionality - The FoundationModels framework provides access to on-device Large Language Model for Apple Intelligence via Swift API [1] - It is optimized for content generation, text summarization, and user input analysis [2] - Enables features like personalized search suggestions and dynamic dialog creation [1] Privacy and Efficiency - All data processed by the model remains private as it runs on-device [2] - The model can operate offline [2] - Integration into the operating system ensures no increase in app size [2]
一招缓解LLM偏科!调整训练集组成,“秘方”在此 | 上交大&上海AI Lab等
量子位· 2025-06-10 07:35
Core Viewpoint - The IDEAL method proposed by a joint team from Shanghai Jiao Tong University and Shanghai AI Lab significantly enhances the performance of large language models (LLMs) across various domains by adjusting the composition of the supervised fine-tuning (SFT) training dataset [3][4]. Group 1: Methodology - The IDEAL method focuses on preparing high-quality training datasets for different domains and modeling the optimization problem to minimize validation loss [5]. - The quantity of training data during the SFT phase is not the key factor; rather, the appropriate distribution of data is crucial to avoid exacerbating the "偏科" phenomenon in models [6][15]. - The research quantifies the impact of data adjustment on the optimal model's performance in the validation set, providing a theoretical foundation for the IDEAL approach [7]. Group 2: Computational Efficiency - The paper employs K-FAC theory to approximate the inverse of the Hessian matrix, which simplifies the computation and allows for scalability to LLM parameter sizes [8]. Group 3: Experimental Results - The IDEAL method was tested on the Llama 3.1 8B model, demonstrating a significant improvement in coding capabilities after just two iterations, regardless of the epoch [10]. - The initial distribution of training data can be further optimized, as IDEAL consistently improved average results across various benchmarks, regardless of the initial distribution [11]. Group 4: Practical Applications - IDEAL addresses the challenge of how to effectively combine high-quality training data from various domains into a unified training set, thus eliminating the need for manual adjustments [14]. - The paper suggests that the optimal value for the hyperparameter m should be around 0.15, as it balances the need for data distribution optimization without being too aggressive [15].
Concord Healthcare Announces Official Release of the Proton Therapy Large Model
Prnewswire· 2025-05-29 20:30
Core Viewpoint - Concord Healthcare Group has made significant advancements in precise tumor diagnosis and treatment technology, particularly with the launch of its self-developed large language model (LLM) for proton therapy, which has been successfully implemented in Guangzhou Concord Cancer Hospital [1][2]. Company Overview - Concord Medical Services Holdings Limited is a healthcare provider specializing in comprehensive oncology services, including cancer diagnosis, treatment, education, and prevention, with a focus on improving the quality and accessibility of cancer care across China [4]. - The company operates a network of self-owned cancer hospitals and clinics, equipped with advanced technology such as proton therapy systems, and aims to provide multidisciplinary cancer care [4]. Technology and Innovation - The proton LLM developed by Concord Healthcare is the first of its kind in China, utilizing a robust tumor diagnosis and treatment technology system built on extensive data accumulated over the years, including nearly 10,000 high-quality radiotherapy cases [2]. - The model integrates data from Proton China and professional journal literature to enhance its training and effectiveness in patient treatment [2]. Market Position - Concord Healthcare serves both cancer patients directly through its own medical institutions and indirectly through third-party medical institutions by providing medical equipment, software, and related services [5]. - The company has established a widespread network of enterprise customers, primarily hospitals, offering integrated oncology-related services, including sales and installation of medical equipment, management, technical support, and operating leases [5].
DeepSeek技术溯源及前沿探索报告
Zhejiang University· 2025-05-22 01:20
浙江大学DS系列专题 DeepSeek技术溯源及前沿探索 主讲人:朱强 浙江大学计算机科学与技术学院 人工智能省部共建协同创新中心(浙江大学) https://person.zju.edu.cn/zhuq 1 Outline 一、语言模型 三、ChatGPT 二、Transformer 四、DeepSeek 五、新一代智能体 2 语言模型:终极目标 Language Modeling 对于任意的词序列,计算出这个序列是一句话的概率 我们每天都和语言模型打交道: I saw a cat I saw a cat on the chair I saw a cat running after a dog I saw a ca car I saw a cat in my dream 3 语言模型:基本任务 编码:让计算机理解人类语言 She is my mom 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 只有一个1,其余均为0 One-hot Encoding有什么缺点吗? One-hot Encoding 4 编码:让计算机理解人类语言 Word Embedding A bottle of tez ...
Did Elon Musk Just Give Nvidia Investors 40 Billion Reasons to Cheer?
The Motley Fool· 2025-05-16 21:00
Elon Musk's AI start-up appears to be eyeing more Nvidia GPUs.When it comes to training generative AI models, Nvidia's (NVDA 0.28%) graphics processing units (GPUs) are hailed as the gold standard among industry experts. That's not exactly a novel conclusion considering the semiconductor powerhouse has amassed an estimated 90% or more of the GPU market.The more subtle idea here is how exactly Nvidia built such a gigantic lead over the competition. While it does not explicitly specify which companies buy its ...
Meta delays release of flagship ‘Behemoth' AI model as engineers struggle: report
New York Post· 2025-05-15 23:15
Core Insights - Meta Platforms is delaying the release of its "Behemoth" AI model due to concerns about its capabilities and the significance of improvements over earlier versions [1][3] - The initial release was scheduled for April to align with Meta's first AI conference but has now been postponed to fall or later [2][3] Development Timeline - Behemoth was originally set for an April release, which was later pushed to June, and is now delayed further [2][3] - The company had previously described Behemoth as "one of the smartest LLMs in the world" and its most powerful model to date [3][5] Recent Developments - In April, Meta released the latest versions of its LLM, Llama 4 Scout and Llama 4 Maverick, while previewing Behemoth [5]