Mistral

Search documents
选择合适的大型语言模型:Llama、Mistral 和 DeepSeek
3 6 Ke· 2025-06-30 05:34
Core Insights - Large Language Models (LLMs) have gained popularity and are foundational to AI applications, with a wide range of uses from chatbots to data analysis [1] - The article analyzes and compares three leading open-source LLMs: Llama, Mistral, and DeepSeek, focusing on their performance and technical specifications [1] Group 1: Model Specifications - Each model series offers different parameter sizes (7B, 13B, up to 65-70B), with the number of parameters directly affecting the computational requirements (FLOP) for inference [2] - For instance, Llama and Mistral's 7B models require approximately 14 billion FLOP per token, while the larger Llama-2-70B model requires about 140 billion FLOP per token, making it ten times more computationally intensive [2] - DeepSeek has a 7B version and a larger 67B version, with similar computational requirements to Llama's 70B model [2] Group 2: Hardware Requirements - Smaller models (7B-13B) can run on a single modern GPU, while larger models require multiple GPUs or specialized hardware [3][4] - For example, Mistral 7B requires about 15GB of GPU memory, while Llama-2-13B needs approximately 24GB [3] - The largest models (65B-70B) necessitate 2-4 GPUs or dedicated accelerators due to their high memory requirements [4] Group 3: Memory Requirements - The raw memory required for inference increases with model size, with 7B models occupying around 14-16GB and 13B models around 26-30GB [5] - Fine-tuning requires additional memory for optimizer states and gradients, often needing 2-3 times the memory of the model size [6] - Techniques like LoRA and QLoRA are popular for reducing memory usage during fine-tuning by freezing most weights and training fewer additional parameters [7] Group 4: Performance Trade-offs - In production, there is a trade-off between latency (time taken for a single input to produce a result) and throughput (number of results produced per unit time) [9] - For interactive applications like chatbots, low latency is crucial, while for batch processing tasks, high throughput is prioritized [10][11] - Smaller models (7B, 13B) generally have lower per-token latency compared to larger models (70B), which can only generate a few tokens per second due to higher computational demands [10] Group 5: Production Deployment - All three models are compatible with mainstream open-source tools and have active communities [12][13] - Deployment options include local GPU servers, cloud inference on platforms like AWS, and even running on high-end CPUs for smaller models [14][15] - The models support quantization techniques, allowing for efficient deployment and integration with various service frameworks [16] Group 6: Safety Considerations - Open-source models lack the robust safety features of proprietary models, necessitating the implementation of safety layers for deployment [17] - This may include content filtering systems and rate limiting to prevent misuse [17] - Community efforts are underway to enhance the safety of open models, but they still lag behind proprietary counterparts in this regard [17] Group 7: Benchmark Performance - Despite being smaller, these models perform well on standard benchmarks, with Llama-3-8B achieving around 68.4% on MMLU, 79.6% on GSM8K, and 62.2% on HumanEval [18] - Mistral 7B scores approximately 60.1% on MMLU and 50.0% on GSM8K, while DeepSeek excels with 78.1% on MMLU and 85.5% on GSM8K [18][19][20] - The performance of these models indicates significant advancements in model design and training techniques, allowing them to compete with larger models [22][25]
北京发文推动人工智能、AIGC等科技与游戏电竞产业深度融合,游戏ETF(159869)现涨3.22%
Mei Ri Jing Ji Xin Wen· 2025-06-24 02:51
6月24日早盘,游戏板块持续拉升,游戏ETF(159869)表现强势,盘中现涨3.22%。持仓股全线飘红,冰川网络、电魂网络、掌趣科技、盛天网络、富 春股份等涨幅居前。 AI应用事件方面,海外,Mistral AI在欧洲推出Mistral Compute云平台,谷歌更新Gemini 2.5三个版本模型,Open AI奥尔特曼称GPT-5预计今年夏季推 出,Midjourney推出其首个AI视频生成模型V1;国内,豆包电脑版、网页版上线"AI播客"功能,MiniMax推出全球首个开源大规模混合架构的推理模型 M1,腾讯元宝上线AI编程模式,MiniMax发布视频生成工具Hailuo02。 【免责声明】本文仅代表作者本人观点,与和讯网无关。和讯网站对文中陈述、观点判断保持中立,不对所包含内容的准确性、可靠性或完整性提供任何明示或暗示的保证。请 读者仅作参考,并请自行承担全部责任。邮箱:news_center@staff.hexun.com 消息面上,北京发文推动人工智能、AIGC等科技与游戏电竞产业深度融合。据媒体报道,中共北京市委宣传部等部门印发《关于促进北京市游戏电竞 行业高质量发展的支持办法(暂行)》。办法 ...
各国都渴望“主权AI”,结果反而加强了对大国的依赖
财富FORTUNE· 2025-06-19 13:01
Air Street Capital公司创始人内森・贝奈奇。图片来源:courtesy of Air Street Capital 阿联酋近日宣布,将向OpenAI公司的"阿联酋星际之门"计划投资200亿美元。这个项目号称是要打造阿 联酋的"主权AI",但它仍然全部依赖美国的芯片、软件和基础设施。这也是所谓"主权AI"的悖论——各 国越是想在AI领域实现独立自主,就越会加深对大国的依赖。 无独有偶,从法国到印度,各国政府都在耗费巨资搞所谓的"主权"AI大模型。法国搞出了一个Mistral大 模型,印度则正在大力推广BharatGPT。每个国家都声称要在AI领域实现战略自主,但事实上都在依赖 全球化的技术路线。 英伟达公司首席执行官黄仁勋提出了一个所谓"AI工厂"的概念,意思是说,在当今时代,数据中心已成 为类似于发电厂或者造船厂这样的重要战略基础设施。不过这只是一种政治语言,并不能反映技术上的 现实情况。这种话术将AI与一个国家的自主能力绑定在了一起,而不管这个国家的底层AI系统是不是 还是外国制造的,或者是否仍然深度融入全球体系。光是用嘴把一个国家的数据中心称为"AI工厂",并 不能为其赋予主权属性。这就好 ...
不用千亿参数也能合成高质量数据!这个开源框架让小模型“组团逆袭”,7B性能直追72B
量子位· 2025-06-17 07:41
Core Viewpoint - The GRA framework (Generator–Reviewer–Adjudicator) proposed by Shanghai AI Lab and Renmin University of China enables small models to collaboratively generate high-quality training data without the need for large-scale language model distillation [1][2][13]. Group 1: GRA Framework Overview - GRA operates on the principle of "multi-person collaboration" and "role division," simulating a peer review process to ensure data quality [7][12]. - The framework consists of three main roles: Generator, Reviewer, and Adjudicator, each contributing to the data generation and evaluation process [8][9][10]. Group 2: Experimental Results - GRA-generated data quality matches or exceeds that of single large language models across ten mainstream datasets, showing significant performance improvements [2][14]. - The GRA framework integrates five open-source small language models, demonstrating that collaboration among smaller models can yield competitive results against larger models [14][17]. Group 3: Performance Metrics - GRA-generated data improved training performance by an average of 6.18% on LLaMA-3.1 and 11.81% on Qwen-2.5 compared to original data [16]. - GRA's performance is only 0.59% lower than the Qwen-72B distilled version, while outperforming it by 8.83% when trained on Qwen-2.5 data [17]. Group 4: Advantages of GRA - GRA enhances data diversity and quality, filling gaps in the original seed data and providing a broader semantic coverage [18]. - The data quality is validated through a robust review process, with over 87.3% of samples receiving high consistency scores [19]. - GRA-generated data presents a higher task difficulty, increasing the effectiveness of training for small models [20].
大模型“拼好题”,45K数据撬动18%提升,数学问题拒绝死记硬背 | MathFusion
量子位· 2025-06-17 07:41
MathFusion通过三种"融合策略",将不同的数学问题巧妙地结合起来,生成封装了二者关系和结构的新问题。 △ 越靠左上角,模型表现越好且数据效率越高。 核心思想:三种"融合策略" MathFusion团队 投稿 量子位 | 公众号 QbitAI 当前数学领域的数据生成方法常常局限于对单个问题进行改写或变换,好比是让学生反复做同一道题的变种,却忽略了数学题目之间内在的关 联性。 为了打破这种局限,让大模型学会"串联"与"并联"知识,上海AI Lab、人大高瓴等团队联合提出了 MathFusion ,通过指令融合增强大语言 模型解决数学问题的能力。 仅使用45K的合成指令,MathFusion在多个基准测试中平均准确率提升了18.0个百分点,展现了卓越的数据效率和性能。 顺序融合(Sequential Fusion) 将两个问题串联起来,前一个问题的答案作为后一个问题的某个输入条件。这就像解决一个多步骤问题,模型需要先解出第一步,才能进 行第二步,从而学会处理问题间的依赖关系。 并列融合(Parallel Fusion) 将两个相似的问题融合在一起,对它们的数学概念进行识别和融合,在原来问题的基础上提出一道新 ...
谷歌搜索推出音频概览功能:AI生成播客式总结;亚马逊将投资200亿澳元扩建澳大利亚数据中心基础设施丨AIGC日报
创业邦· 2025-06-15 23:47
1.【亚马逊将投资200亿澳元扩建澳大利亚数据中心基础设施】亚马逊当地时间6月14日宣布,计划 从2025年到2029年新投资200亿澳元,用于扩建、运营和维护其在澳大利亚的数据中心基础设施。 这是澳大利亚公开宣布的最大一笔全球技术投资,将支持对云计算和人工智能需求的增长,加速人工 智能的应用。(搜狐) 2.【广西启动人工智能开放创新平台建设】《广西人工智能开放创新平台建设工作指引》正式发布, 并同步印发《关于申报建设2025年首批广西人工智能开放创新平台的通知》,此举标志着广西启动 人工智能开放创新平台建设。广西人工智能开放创新平台包括三种类型:一是在人工智能细分领域建 设一批人工智能新型研发机构;二是在广西重点产业领域和面向东盟开放合作的特色领域建设一批人 工智能联合创新中心;三是与东盟国家高校、院所、企业共建人工智能联合实验室。( 财联社) 3.【谷歌搜索推出音频概览功能:AI生成播客式总结】 谷歌搜索引擎推出了一项新功能 —— 音频概 览(Audio Overviews)。该功能利用谷歌的Gemini模型,为用户提供全面且由人工智能生成的音频 总结。用户无需再费力浏览众多搜索结果,只需通过谷歌搜索,即 ...
速递|2.15亿美金豪赌AI瘦身术!Multiverse压缩LLM尺寸95%,让Llama在树莓派上狂奔
Z Potentials· 2025-06-13 03:17
图片来源: Multiverse Computing 西班牙初创公司 Multiverse Computing 于 6 月 12 日 宣布 ,凭借其名为 "CompactifAI" 的技术优势, 已完成 1.89 亿欧元(约合 2.15 亿美元)的巨额 B 轮融资。 本轮 B 轮融资由 Bullhound Capital 领投,该机构曾投资过 Spotify 、 Revolut 、 Delivery Hero 、 Avito 和 Discord 等企业。参与此轮融资的还包括惠普科技风投( HP Tech Ventures )、 SETT 、 Forgepoint Capital International 、 CDP Venture Capital 、 Santander Climate VC 、东芝( Toshiba ) 以及巴斯克风险投资集团( Capital Riesgo de Euskadi - Grupo SPR )。 Multiverse 公司表示,其拥有 160 项专利和全球 100 家客户,包括西班牙电力公司 Iberdrola 、博世 集团( Bosch )和加拿大银行( Bank of C ...
【环球财经】法国米斯特拉尔人工智能公司与英伟达宣布云平台合作
Xin Hua Cai Jing· 2025-06-12 23:02
据介绍,Mistral Compute云平台是一款集成式基础设施产品,旨在支持大规模人工智能模型的开发和部 署。该产品由米斯特拉尔人工智能公司与英伟达合作设计,将搭载18000颗Grace Blackwell"超级芯 片",作为英伟达最先进的芯片之一,该芯片单价约在3万至7万美元之间。 新华财经巴黎6月12日电(记者李文昕) 法国科技初创企业米斯特拉尔人工智能公司11日在法国"科技 万岁"科技创新展上宣布与美国芯片企业英伟达公司达成合作,双方将共同探索建立名为Mistral Compute的云平台。 英伟达公司总裁兼首席执行官黄仁勋表示,将在两年内把欧洲人工智能的算力提高十倍。此外,该公司 将在未来几年向欧洲投资"数十亿美元",还计划加强与欧洲制造业巨头的合作关系,以及协助多个欧洲 国家建立技术中心。 米斯特拉尔人工智能公司在一份新闻稿中表示,Mistral Compute旨在为政府机构和企业提供完整的人工 智能基础架构,包括计算能力、软件解决方案、云 API、托管服务和本地部署等。所有服务均将在欧洲 托管和运营。 (文章来源:新华财经) 这一合作既体现出米斯特拉尔人工智能公司已不再满足于仅仅生产人工智能模型 ...
英伟达 CEO 黄仁勋:要在欧洲盖20座 AI 工厂 量子运算走到转折点
Jing Ji Ri Bao· 2025-06-11 23:36
Core Insights - NVIDIA plans to build 20 AI factories in Europe and establish the world's first "industrial AI cloud" in the region [1] - The AI computing capacity in Europe is expected to increase tenfold within two years [1] - Quantum computing technology is at a turning point, with potential applications to solve significant global issues in the coming years [1] Group 1: AI Infrastructure Development - NVIDIA's CEO Jensen Huang announced the construction of 20 AI factories in Europe [1] - The first industrial AI cloud, equipped with 10,000 GPUs, will be established in Germany [1] - NVIDIA is forming alliances with various European companies, including the French startup Mistral AI [1] Group 2: Quantum Computing Advancements - Huang expressed optimism about the rapid advancement of quantum computing technology, which has been in development for decades [1] - Quantum computers can process information at speeds significantly higher than traditional computers due to their ability to perform parallel computations [1] - Following Huang's positive outlook, stocks of companies involved in quantum technology saw a rise, with Quantum Computing stocks increasing by 12.5% [1]
Microsoft-backed AI lab Mistral is launching its first reasoning model in challenge to OpenAI
CNBC· 2025-06-10 09:47
Reasoning models are systems that can execute more complicated tasks through a step-by-step logical thought process. Mistral's new model "is great at mathematics [and] great at coding," according to Mensch. "We're announcing in a couple of hours our new reasoning model, which is very much competitive with all the others and has the specificity of being able to reason in multiple languages," CEO Arthur Mensch told CNBC's Arjun Kharpal onstage during a fireside chat at London Tech Week. French founder of arti ...