AI科技大本营

Search documents
揭秘夸克首个高考志愿大模型!蒸馏数百名人类专家经验、Agent 可完整生成志愿报告
AI科技大本营· 2025-06-12 09:06
Core Viewpoint - Quark has launched the first high school entrance examination (Gaokao) volunteer filling model in China, providing personalized decision-making services for students during the college application process [1][3]. Group 1: Features of the Quark Gaokao Volunteer Model - The model operates with expert-level decision-making capabilities, offering tailored volunteer filling services based on students' scores, interests, family background, and regional preferences [3][4]. - It utilizes a task planning-execution-check-reflection reasoning process to generate comprehensive reports that include strategies for application, recommended schools, and majors [3][4]. - The "Deep Search" function allows users to input complex queries, which the model breaks down into specific needs, ensuring targeted and in-depth responses [4][11]. Group 2: Training and Data Sources - The model is built on a multi-stage, high-complexity training paradigm, integrating self-supervised semantic modeling and expert-guided strategy refinement [7][9]. - It has structured the communication and decision-making processes of experienced volunteer planners, converting thousands of real expert reasoning chains into high-quality supervised data for deep learning [9][11]. - The knowledge base of the model is the largest in China, covering over 2,900 universities and nearly 1,600 undergraduate programs, ensuring comprehensive and authoritative data for decision-making [11][10]. Group 3: Optimization and Feedback Mechanism - The model employs a closed-loop optimization mechanism that incorporates simulated application scenarios, expert feedback, and strategy scoring to continuously refine its outputs [9][11]. - It aims to provide a comprehensive reference for every student and family by leveraging its advantages in information processing and understanding user needs [11].
OpenAI 的阳谋与野心!「温和的奇点」背后
AI科技大本营· 2025-06-11 08:30
Group 1 - The core viewpoint of the article is that while the future of AI development appears to be a smooth and gradual transition, the reality is marked by intense competition and strategic maneuvers within the industry [1][5][9] - OpenAI's new reasoning model, o3-pro, has been launched, outperforming competitors like Google's Gemini 2.5 Pro and Anthropic's Claude 4 Opus, indicating a significant leap in AI capabilities [5][6] - A fierce price war has ensued, with the previous model o3 seeing an 80% price reduction, and the new o3-pro priced 87% lower than its predecessor o1-pro, aimed at rapidly capturing market share [6][9] Group 2 - The article juxtaposes the optimistic vision of a smooth transition to AI with the competitive and aggressive tactics currently employed in the market, highlighting a contradiction between idealistic goals and real-world actions [9][10] - Altman emphasizes the need to first address the alignment problem in AI systems to ensure they align with human long-term goals before widespread deployment [10][27] - The article acknowledges the potential societal disruptions caused by AI, such as job losses, while also suggesting that the rapid growth of wealth could enable discussions of new social policies [12][23] Group 3 - By the 2030s, it is anticipated that wisdom and energy will become abundant, fundamentally changing the limitations on human progress and enabling unprecedented advancements [3][21] - The article discusses the recursive self-improvement of AI systems, suggesting that advancements in AI will accelerate further research and development, leading to exponential growth in capabilities [22][25] - The cost of intelligence is expected to approach that of electricity, making advanced AI systems more accessible and integrated into everyday life [23][25]
面壁MiniCPM4端侧模型发布:长文本推理 5 倍提速,0.5B 模型拿下新SOTA
AI科技大本营· 2025-06-10 09:31
Core Viewpoint - The release of MiniCPM4.0 marks a significant advancement in edge-side models, showcasing innovations in performance, speed, and storage efficiency, particularly for long text processing [1][4][32] Group 1: Model Performance and Efficiency - MiniCPM4.0-8B is the first native sparse model with a 5% sparsity, achieving a performance comparable to Qwen-3-8B while using only 22% of the training resources [2][5][6] - MiniCPM4.0-0.5B demonstrates impressive performance with a training cost of just 2.7%, outperforming larger models like Qwen-3-0.6B and Llama 3.2, achieving a speed of 600 Token/s [2][5][9] - The model's architecture allows for a 5x speed increase in long text inference and up to 220x in extreme scenarios, addressing the industry's challenge of slow long text processing [4][9][16] Group 2: Technological Innovations - The introduction of the InfLLM sparse attention architecture significantly reduces computational costs, allowing for efficient long text processing by lowering the sparsity from 40%-50% to 5% [18][19][20] - MiniCPM4.0 employs a three-tiered self-developed inference framework, CPM.cu, which optimizes performance for edge devices, achieving a 5x speed enhancement [21][22] - The model utilizes advanced quantization techniques, including P-GPTQ and BitCPM, to minimize computational and memory demands, ensuring efficient deployment [23][24] Group 3: Data and Training Efficiency - The company emphasizes the importance of high-quality data, utilizing innovative methods to construct datasets, which significantly reduces validation costs by 90% [29][30] - The training strategy incorporates the upgraded Model Wind Tunnel v2, optimizing hyperparameter configurations and enhancing GPU resource utilization [30][32] - MiniCPM4.0's development reflects a commitment to maximizing research investment returns through systematic improvements across data, training, and inference processes [28][32] Group 4: Market Position and Future Directions - MiniCPM4.0 has achieved over 10 million downloads across all platforms, indicating strong market acceptance and recognition [32] - The company plans to continue enhancing model knowledge density and intelligence levels, driving efficient development and large-scale applications in edge-side AI [32]
当 AI 能写代码修 bug,高考报计算机专业是“火坑”还是“新机遇” |深度对话 6 位专家
AI科技大本营· 2025-06-10 09:31
从"高考志愿填报导师"张雪峰推出的 17999 元的高考志愿填报服务不到 3 分钟便被抢购一空可见填报专业就有多火爆,而计算机和人工智能更是他经 常推荐的专业。 一年一度的高考已经结束,今年共有 1335 万名考生踏入考场,如果说考试是考生的战场,那么让无数家庭真正反复权衡、难以抉择的,其实是考后的 另一道大题——「填什么专业」。 作者 | 梦依丹 出品丨AI 科技大本营(ID:rgznai100) 而今年,在 AI 浪潮的席卷之下,这道选择题更添了几分迷思与变数: 当 AI 能编写谷歌 25% 的新代码、修复 52% 的程序漏洞…… 几乎所有 Claude Code 的代码都是用它自己反复编写和重构的时…… 曾几何时,一行行代码的敲击是软件工程师的日常。在传统范式下,程序员需要将需求转化为详细的逻辑流程,再用特定语法逐行实现功能。 而如今,以大模型为核心的编程范式,正在从"写代码"转向"写意图"。程序员不再是从零构造者,而是通过自然语言与 AI 对话、协商、迭代,逐步生 成最终代码。 这样的变革,不仅改变了开发方式,也正在悄然改变开发岗位的结构。 当下火爆的 AI 编程智能体 Claude 公司的首席产品官 ...
对话 PyTorch 掌门人 Matt White:AI 应用应该做到“润物细无声”
AI科技大本营· 2025-06-09 10:41
Core Viewpoint - The article discusses the tension surrounding the concept of "openness" in AI, highlighting the phenomenon of "open-washing" where organizations label their models as open-source while imposing restrictive licenses that limit true freedom of use [1][3][4]. Group 1: Open Source and AI - The rise of open-source AI has created a self-accelerating "virtuous cycle," but there is a silent war over the definition of "openness" [1][4]. - Matt White introduced the "Model Open Framework" (MOF) to clarify standards and distinguish true open-source contributors [4]. - The "OpenMDW License" aims to provide maximum freedom for users of AI models, addressing the inadequacy of traditional software licenses in the context of AI [4][7]. Group 2: Global Engagement and Community - PyTorch Day aims to foster a global movement, with significant user engagement from China, where 70% to 80% of traffic on documentation sites originates [6]. - The event serves as a platform for showcasing innovative open-source projects and facilitating knowledge exchange among local engineers and researchers [11]. Group 3: Licensing and Usage - The core of "openness" in AI should be viewed through the lens of licensing, determining what users can do with the models [7]. - Licenses designed specifically for open models consider various aspects, including model architecture, weights, datasets, and documentation, unlike traditional licenses [7]. Group 4: Collaboration and Standards - Collaboration among tech giants and new entrants is essential for advancing open-source AI, with PyTorch serving as a trusted platform for cooperation [9][10]. - The Linux Foundation plays a crucial role in establishing neutral standards that ensure long-term viability and widespread acceptance of protocols [10]. Group 5: Future Trends and Education - The rapid development of AI agents and architectures necessitates a focus on open standards, with organizations like PyTorch and the Linux Foundation playing pivotal roles [10]. - Educators must adapt to the AI era, learning how to effectively integrate AI tools into their teaching without compromising core skill development [13][14]. Group 6: Challenges and Responsibilities - The article emphasizes the importance of addressing the "digital content authenticity" crisis, as AI-generated content becomes increasingly indistinguishable from real content [15]. - The need for responsible AI practices is highlighted, particularly in the context of misinformation and the potential misuse of technology [15].
从「记忆解题」到「深度推理」:港科大推出首个本科数学动态评测基准 UGMathBench
AI科技大本营· 2025-06-09 10:41
数学推理能力作为衡量模型智能水平的关键指标,需对其进行全面公平的评估。然而,现有的 GSM8K、MATH 数学基准因覆盖不足和易被数据污染饱 受诟病,要么缺乏对本科水平数学问题的广泛覆盖,要么可能受到测试集的污染。 为了填补这些空白,来自香港科技大学的研究团队近日发表在 ICLR 2025的最新研究 UGMathBench——首个针对本科数学的多元化动态评测体系, 专为评估 LLM 在本科阶段各类数学主题下的推理能力而设计。它提供了动态多样的评估工具,首次将数学推理评测带入「动态污染防控」时代, 标志 着 LLMs 数学推理评估从"浅层解题"迈向"深层理解"。 论文地址:https://arxiv.org/pdf/2501.13766 | AGI-Eval | 评测榜单 入人机竞赛 | 评测集社区 | Data Studio 団 | | | など | | --- | --- | --- | --- | --- | --- | --- | | | 评测集社区:UGMathBench | | | | | | | | UGMathBench ☞▩ | | | | 我要参评 | | | | UGMathBench 是 ...
从「记忆解题」到「深度推理」:港科大推出首个本科数学动态评测基准 UGMathBench
AI科技大本营· 2025-06-09 09:41AI Processing
数学推理能力作为衡量模型智能水平的关键指标,需对其进行全面公平的评估。然而,现有的 GSM8K、MATH 数学基准因覆盖不足和易被数据污染饱 受诟病,要么缺乏对本科水平数学问题的广泛覆盖,要么可能受到测试集的污染。 为了填补这些空白,来自香港科技大学的研究团队近日发表在 ICLR 2025的最新研究 UGMathBench——首个针对本科数学的多元化动态评测体系, 专为评估 LLM 在本科阶段各类数学主题下的推理能力而设计。它提供了动态多样的评估工具,首次将数学推理评测带入「动态污染防控」时代, 标志 着 LLMs 数学推理评估从"浅层解题"迈向"深层理解"。 论文地址:https://arxiv.org/pdf/2501.13766 | AGI-Eval | 评测榜单 入人机竞赛 | 评测集社区 | Data Studio 団 | | | など | | --- | --- | --- | --- | --- | --- | --- | | | 评测集社区:UGMathBench | | | | | | | | UGMathBench ☞▩ | | | | 我要参评 | | | | UGMathBench 是 ...
Claude Code 首席工程师揭秘 AI 如何重塑开发日常!
AI科技大本营· 2025-06-07 09:42
AI 正在颠覆软件开发! 原文链接:https://www.youtube.com/watch?v=Yf_1w00qIKc 责编 | 梦依丹 出品丨AI 科技大本营(ID:rgznai100) AI 正在颠覆软件开发! 近日,Anthropic 发布了其技术专家 Boris Cherny(Claude Code 首席工程师)与 对外沟通负责人 Alex Albert 的深度对话,揭秘了 AI 编程工具 Claude Code 的诞生、核心能力、使用技巧以及未来展望。从终端的普适性到新模型的强大赋能,再到 Claude.md 文件的妙用,一个由 AI 辅助甚至 主导的编程新时代正加 速到来。 在 Boris 与 Alex 的对谈中,他们围绕 Claude Code 的产品定位、差异化能力和使用体验进行了深入交流。 对话精彩摘要: 超 3 0 0 0 人的「AI 产品及应用交流」社群,不错过 AI 产品风云!诚邀所有 AI 产品 及应用从业者、产品经理、开发者和 创业 者,扫码加群: 进群后,您将有机会得到: · 最新、最值得关注的 AI 产品资讯及大咖洞见 这源于我们 Anthropic 工程师和研究员们平时用 ...
对话智源王仲远:具身智能“小组赛”才刚刚开打,机器人需要“安卓”而非 iOS
AI科技大本营· 2025-06-07 09:42
悟道 1.0 发布时,学术界对" 大模型是通往 AGI 的技术路线 "尚未得出统一结论。 现在的具身智能,也处于这个阶段。 作者 | 王启隆 出品丨AI 科技大本营(ID:rgznai100) 大模型的热潮之下,一种微妙的瓶颈感,正成为行业共识。 "过往所说的 '百模大战',更多是大语言模型的竞争," 智源大会前夕, 智源研究院院长王仲远 在 与 CSDN 的对话中,开门见山地指出了问题的核 心,"而大语言模型受限于互联网数据的使用,性能虽然还在提升,但速度已大不如前。" 出路何在?在王仲远看来,AI 要突破天花板,就必须在"读万卷书"(互联网数据)后,去"行万里路"(物理世界)。 这并非孤立的判断。今年三月, 英伟达 CEO 黄仁勋就在 GTC 大会上为 AI 的下半场指明了方向 :打造"AI 工厂",迎接"物理 AI"时代,让 AI 走出屏 幕,与现实世 界交互。 思考趋于一致,行动便接踵而至。6 月 6 日,CSDN 在北京智源大会现场,见证了王仲远在他的主题演讲中给出的答案。如果说 2021 年的"悟道"系列 代表着对技术路径的探索(" 道 "),那么他所揭晓的全新"悟界"系列,则亮明了新的野心——用 ...
“AGI 五年内或将实现”:AI 教父 Bengio 呼吁中美达成共识,警惕 AI 沦为人类武器
AI科技大本营· 2025-06-06 10:18
【编者按】作为深度学习三巨头之一,图灵奖得主、AI 教父 Yoshua Bengio 在 2025 北京智源大会上,他表示: AI 能完成的任务时长,每七个月就翻一 番,大约五年后,AI 就将达到人类水平, 通用人工智能(AGI)或将在五年内到来,而人类社会却尚未在规则、立法乃至全球治理层面达成一致。 整理 | 梦依丹 出品丨AI 科技大本营(ID:rgznai100) 自从 ChatGPT 横空出世,AI 进入了加速进化的轨道。从最初能写代码、生成文案,到如今能上网查资料、远程操控家电,它早就不再是那个只会聊天 解闷的"电子嘴替"。它开始自己"思考"任务,能在多个软件之间协同操作,甚至具备控制电脑、读写数据库的能力。AI 从幕后算法,变成了贴身助 手,再逐步演化成能自主执行复杂操作的"智能体"——从"听话"走向"行动",它正成为一个真正能"做事"的多面选手。 他呼吁,我们正处在一个关键的时间窗口,必须尽快建立可验证、安全、负责任的控制机制。 演讲伊始,Bengio 教授便分享了一段深刻的个人心路历程。他坦言,在亲身体验 ChatGPT 并目睹 AI 飞速进化后,深感此前对 AI 失控风险的认知不 足。而一个 ...