推理模型
Search documents
阶跃星辰姜大昕:多模态目前还没有出现GPT-4时刻
Hu Xiu· 2025-05-08 11:50
Core Viewpoint - The multi-modal model industry has not yet reached a "GPT-4 moment," as the lack of an integrated understanding-generating architecture is a significant bottleneck for development [1][3]. Company Overview - The company, founded by CEO Jiang Daxin in 2023, focuses on multi-modal models and has undergone internal restructuring to form a "generation-understanding" team from previously separate groups [1][2]. - The company currently employs over 400 people, with 80% in technical roles, fostering a collaborative and open work environment [2]. Technological Insights - The understanding-generating integrated architecture is deemed crucial for the evolution of multi-modal models, allowing for pre-training with vast amounts of image and video data [1][3]. - The company emphasizes the importance of multi-modal capabilities for achieving Artificial General Intelligence (AGI), asserting that any shortcomings in this area could delay progress [12][31]. Market Position and Competition - The company has completed a Series B funding round of several hundred million dollars and is one of the few in the "AI six tigers" that has not abandoned pre-training [3][36]. - The competitive landscape is intense, with major players like OpenAI, Google, and Meta releasing numerous new models, highlighting the urgency for innovation [3][4]. Future Directions - The company plans to enhance its models by integrating reasoning capabilities and long-chain thinking, which are essential for solving complex problems [13][18]. - Future developments will focus on achieving a scalable understanding-generating architecture in the visual domain, which is currently a significant challenge [26][28]. Application Strategy - The company adopts a dual strategy of "super models plus super applications," aiming to leverage multi-modal capabilities and reasoning skills in its applications [31][32]. - The focus on intelligent terminal agents is seen as a key area for growth, with the potential to enhance user experience and task completion through better contextual understanding [32][34].
Sebastian Raschka 新书《从头开始推理》抢先看,揭秘推理模型基础
机器之心· 2025-05-02 04:39
Core Viewpoint - The article discusses the advancements in reasoning capabilities of large language models (LLMs) and introduces the book "Reasoning From Scratch" by Sebastian Raschka, which aims to provide practical insights into building reasoning models from the ground up [2][5][59]. Group 1: Definition and Importance of Reasoning in LLMs - Reasoning in the context of LLMs refers to the model's ability to generate intermediate steps before arriving at a final answer, often described as chain-of-thought (CoT) reasoning [8][10]. - The distinction between reasoning and pattern matching is crucial, as traditional LLMs primarily rely on statistical correlations rather than logical reasoning [23][25]. - Understanding reasoning methods is essential for enhancing LLMs' capabilities to tackle complex tasks, such as solving logical puzzles or multi-step arithmetic problems [5][39]. Group 2: Training Process of LLMs - The typical training process for LLMs consists of two main phases: pre-training and fine-tuning [16][19]. - During pre-training, LLMs are trained on vast amounts of unlabelled text (up to several terabytes) to learn language patterns, which can cost millions of dollars and take months [17][21]. - Fine-tuning involves supervised fine-tuning (SFT) and preference fine-tuning to improve the model's ability to respond to user queries [20][21]. Group 3: Pattern Matching vs. Logical Reasoning - LLMs learn to predict the next token based on statistical patterns in the training data, which allows them to generate coherent text but lacks true understanding [23][24]. - In contrast, logical reasoning requires the ability to derive conclusions step-by-step, identifying contradictions and causal relationships [25][26]. - The article highlights that most LLMs do not actively identify contradictions but instead rely on learned patterns from training data [30][34]. Group 4: Enhancing Reasoning Capabilities - The reasoning capabilities of LLMs gained significant attention with the release of OpenAI's o1 model, which emphasizes a more human-like thought process [41][43]. - Enhancements to LLM reasoning can be achieved through inference-time compute scaling, reinforcement learning, and knowledge distillation [44][46][48]. - These methods aim to improve the model's reasoning ability without retraining the underlying model weights [46][48]. Group 5: Importance of Building Reasoning Models from Scratch - Building reasoning models from scratch provides valuable insights into the capabilities, limitations, and computational trade-offs of LLMs [50][57]. - The shift towards reasoning models reflects a broader trend in the AI industry, emphasizing the need for models that can handle complex tasks effectively [52][55]. - Understanding the underlying mechanisms of LLMs and reasoning models is crucial for optimizing their performance in various applications [57].
国产六大推理模型激战OpenAI?
创业邦· 2025-04-30 10:09
以下文章来源于光子星球 ,作者郝鑫 来源丨光 子星球(ID:TMTweb) 作者丨郝鑫 光子星球 . 细微之处,看见未来 编辑丨王潘 图源丨Midjourney "DeepSeek-R1如同当年苏联抢发的第一颗卫星,成为AI开启新时代的斯普特尼克时刻。" 2025年春节前,DeepSeek比除夕那天的烟花先一步在世界上空绽放。 离年夜饭仅剩几个小时,国内某家云服务器的工程师突然被拉入工作群,接到紧急任务,要求其快速调 优芯片,以适配最新的DeepSeek-R1模型。该工程师告诉我们,"从接入到完成,整个过程不到一周"。 大年初二,一家从事Agent To B业务的厂商负责人电话被打爆,客户的要求简单粗暴:第一时间验证模型 真实性能,尽快把部署提上日程。 节前大模型,节后只有DeepSeek。DeepSeek-R1就像一道分水岭,重新书写了中国大模型的叙事逻辑。 以2022年11月,OpenAI发布基于GPT-3.5的ChatGPT应用为起点,国内自此走上了追赶OpenAI的道路。 2023年,大模型如雨后春笋般冒出头,无大模型不AI,各厂商你追我赶,百模大战初见端倪。 你方唱罢我登场,2024年的主人公变成了 ...
不要思考过程,推理模型能力能够更强丨UC伯克利等最新研究
量子位· 2025-04-29 08:02
实验数据显示,在低资源情况 (即少token数量、少模型参数) 或低延迟情况下,Nothinking方法得出的结果均优于Thinking方法的结果, 实现比传统思考方式更好的精度- 延迟权衡。 其他情况下,NoThinking方法在部分数据集上的表现也能超越Thinking。 衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 其实…… 不用大段大段思考,推理模型也能有效推理! 是不是有点反常识?因为大家的一贯印象里,推理模型之所以能力强大、能给出准确的有效答案,靠的就是长篇累牍的推理过程。 这个过程往往用时很长,等同于需要消耗大量算力。已经有一些研究尝试提高推理效率,但大多仍依赖显式思考过程。 来自UC伯克利和艾伦实验室团队的最新研究结果打破了这一刻板印象—— 通过简单的prompt绕过「思考」这一过程直接生成解决方案,可能同样有效,甚至更好。 这种方法被称为 "无思考(NoThinking)"方法 。 「思考」和「无思考」 研究团队以DeepSeek-R1-Distill-Qwen模型为基础,提出了NoThinking方法。 咱们先来分辨一下Thinking和NoThinking的区别在哪里。 Thin ...
奥特曼自诩:达到或接近天才水平!OpenAI,重磅发布!
Zheng Quan Shi Bao· 2025-04-17 04:31
OpenAI迄今最智能的推理模型发布。 今日,OpenAI发布了最新两款o系列推理模型,分别是o3和o4-mini,这也是o系列中首次可以使用图像进行思维链推理、实现"看图思考"的模型。其中, o3是其最强大的推理旗舰模型,在编程、数学、科学、视觉感知等多个维度的基准测试中都处于领先位置;o4-mini是一个针对快速高效、成本效益推理 进行优化的较小模型,更具性价比。 视觉推理能力"首秀",具备自主执行任务能力 在两款o系列推理模型发布后,OpenAI首席执行官萨姆·奥特曼转发一名体验者的推文,并表示新模型"达到或接近天才水平"。此外,奥特曼还表示,预计 会在未来几周内将o3升级到专业版o3-pro。 据OpenAI介绍,最新发布的o3和o4-mini经过训练后,可以在做出反应前进行更长时间的思考。这是公司迄今为止发布的最智能的模型,代表着ChatGPT 能力的一次重大飞跃。 记者注意到,在半小时的线上发布会直播中,此前曾长期休假的OpenAI总裁GregBrockman(格雷格·布洛克曼)也作为发布者,向观众介绍和演示o3和o4- mini。 根据介绍及演示,o3和o4-mini主要有以下亮点: 一是性能更 ...
OpenAI最早本周发布“o3或o4-mini”,“博士水平AI”要来了?
硬AI· 2025-04-15 15:34
编辑 | 硬 AI OpenAI最新模型取得突破性进展:具备原创构思能力。 点击 上方 硬AI 关注我们 据介绍,最新模型不仅能总结研究论文或解决数学问题,还能够独立提出新构思,连接不同领域的概念,提出创新性实验 设计,完成需要科学家跨领域合作才能实现的成果,相当于"博士水平AI"。 硬·AI 作者 | 李笑寅 据媒体援引知情人士消息, OpenAI最早将在本周发布代号为o3或o4-mini的新模型, 该模型不仅能总结 研究论文或解决数学问题,还能够独立提出新构思,连接不同领域的概念,提出创新性实验设计。 据介绍,即将推出的新模型能同时利用物理学、工程学和生物学等多个领域的知识,提供跨学科的解决方 案,而科学家通常需要跨领域合作才能实现类似成果,相当"博士水平AI"。 硬·AI OpenAI总裁Greg Brockman在2月的"AI研讨会"活动上曾表示: "我们真正的方向是开发能够花大量时间认真思考重要科学问题的模型,我希望在未来几年内,这将 使所有人的效率提高10倍或100倍。" * 感谢阅读! * 转载、合作、交流请留言,线索、数据、商业合作请加微信:IngAI2023 * 欢迎大家在留言区分享您的看法 ...
智谱想给DeepSeek来一场偷袭
Hu Xiu· 2025-03-31 12:39
Core Viewpoint - The article discusses the competitive landscape between Zhipu and DeepSeek, highlighting Zhipu's recent product launches and pricing strategies aimed at challenging DeepSeek's dominance in the AI model market [2][10]. Product Launches - On March 31, Zhipu launched the "AutoGLM Thinking Model" and the inference model "GLM-Z1-Air," claiming that Air can match the performance of DeepSeek's R1 model with only 32 billion parameters compared to R1's 671 billion parameters [2]. - The pricing for Zhipu's model is set at 0.5 yuan per million tokens, significantly lower than DeepSeek's pricing, which is 1/30 of DeepSeek's model [2]. Market Dynamics - The article notes a shift in the AI model industry, with some companies, including Baichuan Intelligence and Lingyi Wanyi, experiencing strategic pivots or downsizing, indicating a loss of investor patience with AI startups [3][4]. - Despite the challenges, Zhipu continues to secure funding from state-owned enterprises, positioning itself as a leader among the "six small tigers" in the large model sector [4][6]. Commercialization Challenges - The commercialization of large models remains a significant hurdle for the industry, with Zhipu acknowledging the need to pave the way for an IPO while facing uncertain market conditions [6]. - Zhipu is focusing on penetrating various sectors, including finance, education, healthcare, and government, while also establishing an alliance with ASEAN countries and Belt and Road nations for collaborative model development [6]. Strategic Positioning - Zhipu's CEO emphasizes the company's commitment to pre-training models, despite industry trends moving towards post-training and inference models [3][12]. - The company aims to balance its technological advancements with commercial strategies, ensuring that both aspects support each other dynamically [21]. Future Outlook - The article suggests that Zhipu is optimistic about achieving significant growth in 2025, with expectations of a tenfold increase in market opportunities, while maintaining a stable commercialization strategy [22].
喝点VC|a16z关于DeepSeek的内部复盘:推理模型革新与20倍算力挑战下的AI模型新格局
Z Potentials· 2025-03-23 05:10
Core Insights - The article discusses the emergence and significance of DeepSeek, a new high-performance reasoning model from China, highlighting its open-source nature and the implications for the AI landscape [3][4][12]. Group 1: DeepSeek Overview - DeepSeek has gained attention for its performance on AI model rankings, raising both interest and concerns [3]. - The model's open-source release of weights and technical details provides valuable insights into reasoning models and their future development [4][12]. Group 2: Training Process - The training of DeepSeek involves three main steps: pre-training on vast datasets, supervised fine-tuning (SFT) with human-generated examples, and reinforcement learning with human feedback (RLHF) [6][9][10]. - The training process is designed to enhance the model's ability to provide accurate and contextually relevant answers, moving beyond simple question-answering to more complex reasoning [11][12]. Group 3: Innovations and Techniques - DeepSeek R1 represents a culmination of various innovations, including self-learning capabilities and multi-stage training processes that improve reasoning abilities [11][13][14]. - The model employs a mixture of experts (MoE) architecture, which allows for efficient training and high performance in reasoning tasks [15][30]. Group 4: Performance and Cost - The cost of training DeepSeek V3 was approximately $5.5 million, with the transition to R1 being less expensive due to the focus on reasoning and smaller-scale SFT [27][29]. - The article notes that the performance of reasoning models has significantly improved, with DeepSeek R1 demonstrating capabilities comparable to leading models in the industry [31][35]. Group 5: Future Implications - The rise of reasoning models like DeepSeek indicates a shift in the AI landscape, necessitating increased computational resources for inference and testing [31][34]. - The open-source nature of these models fosters innovation and collaboration within the AI community, potentially accelerating advancements in the field [36][39].
解读英伟达的最新GPU路线图
半导体行业观察· 2025-03-20 01:19
Core Viewpoint - High-tech companies consistently develop roadmaps to mitigate risks associated with technology planning and adoption, especially in the semiconductor industry, where performance and capacity limitations can hinder business operations [1][2]. Group 1: Nvidia's Roadmap - Nvidia has established an extensive roadmap that includes GPU, CPU, and networking technologies, aimed at addressing the growing demands of AI training and inference [3][5]. - The roadmap indicates that the "Blackwell" B300 GPU will enhance memory capacity by 50% and increase FP4 performance to 150 petaflops, compared to previous models [7][11]. - The upcoming "Vera" CV100 Arm processor is expected to feature 88 custom Arm cores, doubling the NVLink C2C connection speed to 1.8 TB/s, enhancing overall system performance [8][12]. Group 2: Future Developments - The "Rubin" R100 GPU will offer 288 GB of HBM4 memory and a bandwidth increase of 62.5% to 13 TB/s, significantly improving performance for AI workloads [9][10]. - By 2027, the "Rubin Ultra" GPU is projected to achieve 100 petaflops of FP4 performance, with a memory capacity of 1 TB, indicating substantial advancements in processing power [14][15]. - The VR300 NVL576 system, set for release in 2027, is anticipated to deliver 21 times the performance of current systems, with a total bandwidth of 4.6 PB/s [17][18]. Group 3: Networking and Connectivity - The ConnectX-8 SmartNIC will operate at 800 Gb/s, doubling the speed of its predecessor, enhancing network capabilities for data-intensive applications [8]. - The NVSwitch 7 ports are expected to double bandwidth to 7.2 TB/s, facilitating faster data transfer between GPUs and CPUs [18]. Group 4: Market Implications - Nvidia's roadmap serves as a strategic tool to reassure customers and investors of its commitment to innovation and performance, especially as competitors develop their own AI accelerators [2][4]. - The increasing complexity of semiconductor manufacturing and the need for advanced networking solutions highlight the competitive landscape in the AI and high-performance computing sectors [1][4].
从腾讯百度到车企券商,为何「万物」都想接入 DeepSeek?
声动活泼· 2025-03-14 05:45
根据国泰君安的研报,自从 DeepSeek 爆火之后,接入他们大模型的需求在短时间内迅速增加。从 2 月初至 今,腾讯、百度、阿里等互联网大厂,不仅在各自的云计算平台上线了 DeepSeek 模型。在直接面向用户的 业务上,即使这些巨头都拥有自己的大模型,但依然让旗下的部分应用接入了 DeepSeek。其中,包括月活 跃用户量达 13.8 亿的微信,以及曾因广告收入受影响、对 AI 搜索存在顾虑的百度。 除了互联网大厂,吉利、一汽大众等几十家车企、华为等主流手机厂商、三大电信运营商,也都在短时间 内完成了接入。甚至有些银行、券商、公募基金,以及国内部分地区的各类政府部门,也加入了这个行 列。比如,有些银行把 DeepSeek 应用到了面向用户的智能客服上。深圳、广州、呼和浩特、无锡等地的政 府,也宣布在政务系统中接入了 DeepSeek 模型,希望提升政务办公效率和群众办事体验。 那么,从汽车品牌到券商甚至政府,为什么大家纷纷都想要接入 DeepSeek? ▲ 近日,吉利汽车正式官宣其自研大模型与 DeepSeek 已完成深度融合。| 图源:吉利汽车集团微信公众号 财新的报道指出,腾讯等大厂积极接入 Deep ...