Workflow
大语言模型
icon
Search documents
一文了解DeepSeek和OpenAI:企业家为什么需要认知型创新?
混沌学园· 2025-06-10 11:07
Core Viewpoint - The article emphasizes the transformative impact of AI technology on business innovation and the necessity for companies to adapt their strategies to remain competitive in the evolving landscape of AI [1][2]. Group 1: OpenAI's Emergence - OpenAI was founded in 2015 by Elon Musk and Sam Altman with the mission to counteract the monopolistic power of major tech companies in AI, aiming for an open and safe AI for all [9][10][12]. - The introduction of the Transformer architecture by Google in 2017 revolutionized language processing, enabling models to understand context better and significantly improving training speed [13][15]. - OpenAI's belief in the Scaling Law led to unprecedented investments in AI, resulting in the development of groundbreaking language models that exhibit emergent capabilities [17][19]. Group 2: ChatGPT and Human-Machine Interaction - The launch of ChatGPT marked a significant shift in human-machine interaction, allowing users to communicate in natural language rather than through complex commands, thus lowering the barrier to AI usage [22][24]. - ChatGPT's success not only established a user base for future AI applications but also reshaped perceptions of human-AI collaboration, showcasing vast potential for future developments [25]. Group 3: DeepSeek's Strategic Approach - DeepSeek adopted a "Limited Scaling Law" strategy, focusing on maximizing efficiency and performance with limited resources, contrasting with the resource-heavy approaches of larger AI firms [32][34]. - The company achieved high performance at low costs through innovative model architecture and training methods, emphasizing quality data selection and algorithm efficiency [36][38]. - DeepSeek's R1 model, released in January 2025, demonstrated advanced reasoning capabilities without human feedback, marking a significant advancement in AI technology [45][48]. Group 4: Organizational Innovation in AI - DeepSeek's organizational model promotes an AI Lab paradigm that fosters emergent innovation, allowing for open collaboration and resource sharing among researchers [54][56]. - The dynamic team structure and self-organizing management style encourage creativity and rapid iteration, essential for success in the unpredictable field of AI [58][62]. - The company's approach challenges traditional hierarchical models, advocating for a culture that empowers individuals to explore and innovate freely [64][70]. Group 5: Breaking the "Thought Stamp" - DeepSeek's achievements highlight a shift in mindset among Chinese entrepreneurs, demonstrating that original foundational research in AI is possible within China [75][78]. - The article calls for a departure from the belief that Chinese companies should only focus on application and commercialization, urging a commitment to long-term foundational research and innovation [80][82].
0.5B以小搏大拿下端侧模型新SOTA:4090可跑,长文本处理5倍常规加速丨清华&面壁开源
量子位· 2025-06-10 07:35
清华大学&面壁智能 投稿 量子位 | 公众号 QbitAI 端侧性价比之王,清华大学和面壁智能团队开源新模型—— MiniCP M 4 ,提供 8B、0.5B 两种参数规模, 仅使用同级别开源模型22%的训练开销 ,就达到了同级别最优性能。 MiniCPM4-8B是 开源首个开源的原生稀疏模型,5%的极高稀疏度加持,让长文本、深思考在端侧真正跑起来。 在MMLU、CEval、MATH500、HumanEval等基准测试中,以仅22%的训练开销,性能比肩 Qwen-3-8B,超越Gemma-3-12B。 MiniCPM4-0.5B 在性能上,也展现出以小博大——在MMLU、CEval、BBH、HumanEval等基准测试中,MiniCPM4.0 -0.5B性能超越同级 的Qwen-3-0.6B、Llama 3.2、Gemma3, 并通过 原生QAT技术 实现几乎不掉点的int4量化以及600Token/s的极速推理速度。 在常见端侧芯片,比如Jetson AGX Orin与RTX 4090上,MiniCPM 4可实现长文本处理的5倍常规加速与极限场景下的百倍加速。 请看VCR: 目前团队已公开发布技术报告,该模 ...
苹果(AAPL.O):今年Apple智能将支持更多语言,同时苹果将允许直接访问位于Apple智能核心的设备端大语言模型。
news flash· 2025-06-09 17:31
苹果(AAPL.O):今年Apple智能将支持更多语言,同时苹果将允许直接访问位于Apple智能核心的设备 端大语言模型。 ...
北大携手深圳先进院推出合成生物AI大语言模型,成功获得高性能加帽酶,催化效率高于商业酶2倍
天然生物基因组编码海量的功能基因,这些基因在长期进化选择过程中,占据了广泛的序列空间,并发展 出精巧多样的功能活性,为生物体在复杂环境中的生存和繁衍提供了独特优势。 随着测序获得的生物序列累计达数十亿量级,这些潜在的功能基因也为生物制造和合成生物技术提供了基 因元件的"宝库"。然而,尽管天然基因具备极为丰富的功能和应用潜力,目前只有一小部分热门的功能基因 (如基因编辑工具酶)被高质量注释并构建了序列或结构模型。因此,基于序列、结构或深度学习的基因 挖掘和蛋白质设计方法无法拓展至复杂功能基因,限制了对高价值基因元件的挖掘与开发利用。 【SynBioCon】 获悉,针对上述问题 , 北京大学定量生物学中心钱珑 团队 于近日推出了 一款面向 合成生物学元件挖掘与生物制造应用的大语言模型 SYMPLEX , 该模型通过融合领域大语言模型训 练、合成生物专家知识对齐和大规模生物信息分析,实现了从海量文献中自动化挖掘功能基因元件并精准 推荐其工程化应用潜力。 此外, 团队 与 中科院深圳先进技术研究院娄春波研究员 合作,将 SYMPLEX 应用于 mRNA 疫苗生物制 造关键酶—— 加帽酶的挖掘 ,成功获得多种高性能新型 ...
AI动态汇总:谷歌更新Gemini2.5Pro,阿里开源Qwen3新模型
China Post Securities· 2025-06-09 11:39
证券研究报告:金融工程报告 研究所 分析师:肖承志 SAC 登记编号:S1340524090001 Email:xiaochengzhi@cnpsec.com 研究助理:冯昱文 SAC 登记编号:S1340124100011 Email:fengyuwen@cnpsec.com 近期研究报告 《结合基本面和量价特征的 GRU 模 型》 - 2025.06.05 《Claude 4 系列发布,谷歌上线编程 智能体 Jules——AI 动态汇总 20250526》 - 2025.05.27 《谷歌发布智能体白皮书,Manus 全面 开放注册——AI 动态汇总 20250519》 - 2025.05.20 《证监会修改《重组办法》,深化并购 重组改革——微盘股指数周报 20250518》 - 2025.05.19 《通义千问发布 Qwen-3 模型, DeepSeek 发布数理证明大模型——AI 动态汇总 20250505》 - 2025.05.06 《基金 Q1 加仓有色汽车传媒,减仓电 新食饮通信——公募基金 2025Q1 季报 点评》 - 2025.04.30 《泛消费打开连板与涨幅高度,ETF 资 金平铺机 ...
从「记忆解题」到「深度推理」:港科大推出首个本科数学动态评测基准 UGMathBench
AI科技大本营· 2025-06-09 10:41
数学推理能力作为衡量模型智能水平的关键指标,需对其进行全面公平的评估。然而,现有的 GSM8K、MATH 数学基准因覆盖不足和易被数据污染饱 受诟病,要么缺乏对本科水平数学问题的广泛覆盖,要么可能受到测试集的污染。 为了填补这些空白,来自香港科技大学的研究团队近日发表在 ICLR 2025的最新研究 UGMathBench——首个针对本科数学的多元化动态评测体系, 专为评估 LLM 在本科阶段各类数学主题下的推理能力而设计。它提供了动态多样的评估工具,首次将数学推理评测带入「动态污染防控」时代, 标志 着 LLMs 数学推理评估从"浅层解题"迈向"深层理解"。 论文地址:https://arxiv.org/pdf/2501.13766 | AGI-Eval | 评测榜单 入人机竞赛 | 评测集社区 | Data Studio 団 | | | など | | --- | --- | --- | --- | --- | --- | --- | | | 评测集社区:UGMathBench | | | | | | | | UGMathBench ☞▩ | | | | 我要参评 | | | | UGMathBench 是 ...
从「记忆解题」到「深度推理」:港科大推出首个本科数学动态评测基准 UGMathBench
AI科技大本营· 2025-06-09 09:41AI Processing
数学推理能力作为衡量模型智能水平的关键指标,需对其进行全面公平的评估。然而,现有的 GSM8K、MATH 数学基准因覆盖不足和易被数据污染饱 受诟病,要么缺乏对本科水平数学问题的广泛覆盖,要么可能受到测试集的污染。 为了填补这些空白,来自香港科技大学的研究团队近日发表在 ICLR 2025的最新研究 UGMathBench——首个针对本科数学的多元化动态评测体系, 专为评估 LLM 在本科阶段各类数学主题下的推理能力而设计。它提供了动态多样的评估工具,首次将数学推理评测带入「动态污染防控」时代, 标志 着 LLMs 数学推理评估从"浅层解题"迈向"深层理解"。 论文地址:https://arxiv.org/pdf/2501.13766 | AGI-Eval | 评测榜单 入人机竞赛 | 评测集社区 | Data Studio 団 | | | など | | --- | --- | --- | --- | --- | --- | --- | | | 评测集社区:UGMathBench | | | | | | | | UGMathBench ☞▩ | | | | 我要参评 | | | | UGMathBench 是 ...
CVPR 2025 Highlight|AdaCM2:首个面向超长视频理解的跨模态自适应记忆压缩框架
机器之心· 2025-06-09 04:33
本文第一作者为前 阿里巴巴达摩院高级技术专家 ,现一年级博士研究生满远斌,研究方向为高效多模态大模型推理和生成系统。通信作者为第一作者的导 师,UTA 计算机系助理教授尹淼。尹淼博士目前带领 7 人的研究团队,主要研究方向为多模态空间智能系统,致力于通过软件和系统的联合优化设计实现 空间人工智能的落地。 近年来,大语言模型(LLM)持续刷新着多模态理解的边界。当语言模型具备了「看视频」的能力,视频问答、视频摘要和字幕生成等任务正逐步迈入真正 的智能阶段。但一个现实难题亟待解决—— 如何高效理解超长视频? 为此,来自得克萨斯大学阿灵顿分校(UTA)计算机系研究团队提出了 AdaCM2 :首个支持 超长视频理解 的跨模态记忆压缩框架。该研究已被 CVPR 2025 正式接收 ,并荣获 Highlight 论文 (接收率为 3%),展示出其在技术创新与实际价值上的双重突破。 论文标题:AdaCM2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction 论文地址:https://arxiv.o ...
具身智能推动实现通用人工智能
Group 1 - The core idea of embodied intelligence emphasizes that cognition is influenced by the agent's perception and actions, suggesting that intelligence arises from the interaction between the agent's body and the surrounding environment, rather than solely from brain function [1][2] - Embodied intelligence theory has profound implications across various fields such as cognitive science, psychology, anthropology, and art, leading to the emergence of sub-disciplines like embodied cognition and embodied psychology [1][2] - The transition from traditional disembodied intelligence to modern embodied intelligence marks a significant shift in artificial intelligence research, where the latter integrates physical interaction with the environment for learning and decision-making [2][3] Group 2 - The history of artificial intelligence has evolved through three stages: the first generation focused on knowledge-based reasoning models, the second generation introduced data-driven models, and the third generation, marked by the emergence of large language models, represents a new phase of development [3][4] - The introduction of large language models in 2020 has enabled machines to achieve free interaction with humans in open domains, indicating a significant step towards general artificial intelligence [4][5] - Despite advancements in language generation, there are still limitations in achieving domain generality across various tasks, particularly in complex areas like medical diagnosis, highlighting the need for embodied intelligence to bridge these gaps [5][6] Group 3 - The concept of embodied intelligence was first proposed in the field of robotics, emphasizing the importance of the interaction between the body and the environment in intelligent behavior [6][7] - Embodied intelligence has driven advancements in robotics technology, shifting from single-modal perception to multi-modal perception, which is crucial for applications like autonomous vehicles [8][9] - The integration of the agent concept in embodied intelligence allows robots to combine thinking, perception, and action, facilitating tasks in both digital and physical worlds, and enhancing the efficiency of robotic development through simulation [9]
光芯片,即将起飞!
半导体行业观察· 2025-06-09 00:53
Core Viewpoint - The rapid development of large language models (LLMs) is pushing the limits of contemporary computing hardware, necessitating exploration of alternative computing paradigms such as photonic hardware to meet the increasing computational demands of AI models [1][4]. Group 1: Photonic Hardware and Its Advantages - Photonic computing utilizes light for information processing, offering high bandwidth, strong parallelism, and low thermal dissipation, which are essential for next-generation AI applications [4][5]. - Recent advancements in photonic integrated circuits (PICs) enable the construction of fundamental neural network modules, such as coherent interferometer arrays and micro-ring resonator weight arrays, facilitating dense matrix multiplication and addition operations [4][5]. - The integration of two-dimensional materials like graphene and transition metal dichalcogenides (TMDCs) into silicon-based photonic platforms enhances the functionality of modulators and on-chip synaptic elements [5][31]. Group 2: Challenges in Mapping LLMs to New Hardware - Mapping transformer-based LLM architectures to new photonic hardware presents challenges, particularly in designing reconfigurable circuits for dynamic weight matrices that depend on input data [5][6]. - Achieving nonlinear functions and normalization in photonic or spintronic media remains a significant technical hurdle [5][6]. Group 3: Key Components and Technologies - Photonic neural networks (PNNs) leverage various optical devices, such as micro-ring resonators and Mach-Zehnder interferometer arrays, to perform efficient computations [9][13]. - The use of metasurfaces allows for high-density parallel optical computations by modulating light properties through sub-wavelength structured materials [14][16]. - The 4f optical systems enable linear filtering functions through Fourier transformation, integrating deep diffraction neural networks into optical architectures [20][21]. Group 4: Integration of Two-Dimensional Materials - The integration of graphene and TMDCs into photonic chips is crucial for developing high-speed and energy-efficient AI hardware, with applications in optical modulators, photodetectors, and waveguides [31][35][36]. - Graphene's exceptional optical and electronic properties, combined with TMDCs' tunable bandgap, enhance the performance of photonic devices, making them suitable for AI workloads [31][32]. Group 5: Future Directions and Challenges - The scalability of integrating two-dimensional materials poses challenges due to their fragility, necessitating advancements in transfer techniques and wafer-scale synthesis [45]. - Material stability and the complexity of integration with existing CMOS processes are critical factors that need to be addressed for widespread adoption of these technologies [45][46].