Workflow
歸藏的AI工具箱
icon
Search documents
AI也需要"记笔记":Karpathy从Claude 1.6万字提示词中看到的未来
歸藏的AI工具箱· 2025-05-12 08:28
Core Viewpoint - The article discusses the significance of system prompts in large language models (LLMs), particularly focusing on Claude's extensive system prompt and the potential for a new learning paradigm termed "system prompt learning" proposed by Karpathy [6][12]. Group 1: System Prompts Overview - Claude's system prompt consists of 16,739 words, significantly longer than OpenAI's ChatGPT o4-mini, which has only 2,218 words, representing just 13% of Claude's prompt [2][3]. - System prompts serve as an initial instruction manual for LLMs, guiding their roles, rules, and response styles [4]. - The content of Claude's system prompt includes tool definitions, user preferences, and guidelines for various tasks, indicating a structured approach to AI interactions [8]. Group 2: Current Learning Paradigms - The existing learning paradigms for LLMs include pretraining, which provides broad knowledge through large datasets, and finetuning, which adjusts model behavior through parameter updates [9]. - Unlike LLMs, humans often learn by summarizing experiences and strategies, akin to "note-taking," rather than solely relying on parameter updates [10]. Group 3: System Prompt Learning - Karpathy suggests that LLMs should adopt a "system prompt learning" mechanism, allowing them to store strategies and knowledge in an explicit format, enhancing efficiency and scalability [10][12]. - This new learning paradigm could lead to more effective data utilization and improved generalization capabilities for LLMs [19]. Group 4: Practical Implications - Clear and detailed instructions in system prompts lead to more accurate AI responses, emphasizing the importance of structured communication [13][14]. - The article highlights that "prompt engineering" is an extension of everyday communication skills, making it accessible for ordinary users [16].
生成网页可以垫视频了?教你用 Gemini 2.5 最强大的能力
歸藏的AI工具箱· 2025-05-09 08:34
Core Viewpoint - The article highlights the advanced capabilities of Gemini 2.5 Pro 0506, particularly its ability to generate high-fidelity web effects from uploaded interactive videos, showcasing significant improvements in front-end development and user interface design [1][4]. Group 1: Version Overview - Gemini 2.5 Pro 0506 was released on May 6, 2023, in preparation for the Google I/O conference [4]. - The main updates include substantial enhancements in front-end and user interface development, as well as improvements in basic coding tasks such as code conversion and editing [4]. Group 2: Testing and Capabilities - Initial tests demonstrated that Gemini can create interactive web pages from videos, leveraging its strong video multimodal understanding capabilities [5][6]. - Further tests revealed that while Gemini performs well in generating interactive animations, it may overlook some finer details, such as color changes and spacing [7][8]. Group 3: Usage Guidelines - A template for effective prompts was provided, emphasizing the need to describe key animation effects and details that Gemini might miss due to its limitations [10][11]. - Users are advised to upload videos to AI Studio for optimal results, ensuring videos are compressed and not too lengthy to maintain context [13]. Group 4: Conclusion and Community Engagement - The article concludes by encouraging users to explore the potential of Gemini's capabilities beyond simple animations and invites community discussion for further innovative applications [14].
设计师的 ChatGPT 时刻:Figma 这次把“设计即代码”玩成现实
歸藏的AI工具箱· 2025-05-08 08:55
Core Viewpoint - The article discusses the emergence of two main categories of AI programming products following the popularity of Cursor and AI programming, highlighting their distinct functionalities and target audiences [1]. Group 1: AI IDE Products - AI IDE products like Cursor and Windsurf possess all the capabilities of traditional IDEs, with AI functionalities such as code completion, chat, and agents being supplementary. Users can still write code without utilizing these AI features [2]. - These products cater to a specific audience that is familiar with coding, allowing for a more traditional development experience enhanced by AI tools [2]. Group 2: Vibe Coding Products - Vibe Coding products, such as V0 and Lovable, primarily rely on dialogue with AI coding agents for coding, limiting their ability to view and edit code directly [3]. - Lovable-type products have a broader user base since they allow users to describe their needs in natural language, making them more accessible to non-developers [5]. - However, Vibe Coding products face challenges in accurately translating design elements into code, particularly with nuanced design details that are difficult to describe verbally [5]. Group 3: Figma's Role in Vibe Coding - Figma is positioned as a key player in the Vibe Coding space, leveraging its existing ecosystem to facilitate the conversion of design files into code. The CEO emphasizes "Design as Prompt," indicating that design files serve as precise prompts for code generation [7]. - Figma's new product, Figma Make, allows users to import design files directly and generate web pages, significantly enhancing the expressiveness of the generated output [10]. Group 4: User Interaction and Iteration - Figma Make features a user-friendly interface that supports direct interaction with design elements, allowing for precise modifications without excessive communication with AI [11][12]. - The product also integrates advanced capabilities, such as embedding maps and utilizing 3D materials, enhancing the functionality of web pages created from design files [14][16]. Group 5: Future Implications - The introduction of Figma Make is expected to expand the responsibilities of designers, with the emergence of roles like Prompt Engineer, as designers increasingly engage with coding [19]. - The article suggests that Figma's strategic approach to AI development is more coherent compared to competitors like Adobe, indicating a clear understanding of which aspects to innovate and which to maintain [19].
一图展示全部信息:提示词 + Figma 十秒精修,让长网页秒变封面(内有白嫖福利)
歸藏的AI工具箱· 2025-05-06 08:09
Core Viewpoint - The article provides a tutorial on how to generate web pages using AI tools and convert them into images, emphasizing the importance of the initial generation results and offering practical tips for adjustments using design software. Group 1: Web Page Generation - The article discusses the process of generating a web page using the DeepSeek-Prover-V2 model, highlighting the need for relevant documents such as papers or blog posts as input [4][5]. - It emphasizes the importance of using specific prompt phrases to ensure the generated content is concise and visually appealing, such as "try to display all information on one page" [6][9]. - The article outlines design principles for the generated web page, including the use of large fonts for key points, responsive design for larger displays, and a clean visual style [9]. Group 2: Design Adjustments - The article explains how to use Figma for manual adjustments to the generated web page, including importing the webpage and modifying elements for better visual coherence [12][15]. - It details the steps to refine the layout, such as adjusting widths and ensuring elements occupy the correct space, which enhances the overall presentation [18][21]. - The final steps include ensuring uniform margins and exporting the adjusted design, with suggestions for adding visual effects like gradient borders [22][23].
设计速度提升100倍,质量翻10倍:豆包超能创意1.0体验
歸藏的AI工具箱· 2025-04-29 08:18
豆包前段时间新的图片模型的实力大家应该也看到了。 强大的提示词理解加上字体和营销图片生成能力直接让人人都能生成自己需要的营销图片或者进行字体设计。 就在前天 豆包又更新了超能创意 1.0 模式 ,我被灰度到了试了一下,给我整麻了。 图片的生成效率和修改效率大幅提升,让本来就很低的设计门槛又低了一大截。 我们可以先看个例子再介绍 我输入的提示词为: 参考下面的提示词帮我生成十个其他知名品牌的胶囊 16:9 图片,先基于品牌和主营业务更改提示词 中的内容然后在生成。 示例提示词为:一个高高的、外观逼真且充满活力的胶囊体水平漂浮着。它的左半边是标志性的星 巴克绿色,标有"Starbucks – Uplifting the Everyday"字样以及经典的美人鱼(Siren)标志。右半 边是透明的,里面填充着漂浮的烘焙咖啡豆、细腻的奶泡漩涡、手绘咖啡杯图标以及代表社区连接 的抽象暖色调线条,需要有背景色。 来看看他给我的结果,我根本没提要哪些品牌,也没提这些品牌的主营业务和典型产品。 他直接从LLM 模型拿到了这些知识然后还按照要求改了提示词 ,太离谱了,而且 这十张图片的生成速度比 4 o 一张都要快很多 。 我测 ...
AI 工具堆里最豪横的那一个:纳米 MCP 万能工具箱上手(内有邀请码和藏师傅手搓智能体)
歸藏的AI工具箱· 2025-04-28 10:45
上周真是 MCP 诞生以来热度最高的一周,大家扎堆发布 MCP 的 Agents 工具。 纳米也发布了自己基于 MCP 驱动的 Agents 服务 "MCP万能工具箱"。 这个工具箱不仅包含纳米AI自研的十几个MCP工具,还引入了近百个第三方MCP工具,工具总量目前位居国内第一。 MCP 工具覆盖办公协作、学术研究、生活服务、搜索引擎、金融、媒体娱乐、数据抓取等多个领域。 今天终于有空详细玩了一下,还基于纳米的能力做了一个 Agents 帮大家生成藏师傅同款的展示网页。 深度研究智能体 先来看一下深度研究智能体,这部分直接就可以用不需要你任何配置,直接调用纳米的规划、搜索能力最后帮你生成各种形式的展示内容。 你可以在纳米 AI 客户端左侧智能体的部分找到深度研究智能体的入口,然后点击使用就行。 这里除了可以搜索网上内容外,也支持搜索你自己个人知识库的内容,比如我这里就让他基于网络和我自己的知识库搜索 MCP 的相关内容并且生成对应的网页解 释和 PDF 文件。 他经过 20 分钟的搜索和思考之后给了一个非常详细的报告,PDF 里面的内容非常丰富,总共消耗了 47 万 Token,而且这些都是免费的。 比如 M ...
从搜索到解决方案:解锁火山 DeepSearch 的“三连跳” MCP 玩法
歸藏的AI工具箱· 2025-04-24 09:34
最近真是捅了 MCP 窝了,上周火山开了一次开发者见面会,发布了挺多东西的,主要有: RTC 硬件这个也不太好测试,主要我也不懂,而且需要硬件,这次主要试一下 DeepSearch 服务。 其实现在所谓的 Agents 服务主要的任务和内容还是基于AI 搜索信息的加工和再整理,这部分是核心,也是 非常吃技术能力的地方。 火山把这部分能力变成应用之后对于开发者来说省了很多事情,人人都能搞 DeepSearch 了。 效果怎么样 先来一个最常见的问题和测试旅游规划。 即使这种看起来简单的任务很多 AI 搜索其实做的不好,看着内容输出很多,很多都是各个景点介绍的废话。 用户其实需要的是实时性比较强的信息,比如交通怎么安排,怎么样可以顺路,一些危险的项目需要准备哪些 东西等。 正式发布了豆包深度思考模型 Doubao-1.5-thinking-pro 和全新的视觉理解模型 Doubao-1.5-vision- pro,这个咱们上周介绍过了,视觉推理非常强大, 感兴趣可以去看我的测试 。 还发布了方舟 × RTC 硬件:把端侧自动唤醒与云端大模型语音能力一次打包,让玩具、家居、穿戴等设备 一键升级为能与人自然实时对话的 ...
藏师傅的网页生成提示词 3.0| 原来 Gemini 2.5 Pro 这么强
歸藏的AI工具箱· 2025-04-23 08:32
早上群里有个朋友说自己用 Gemini APP 里面的深度研究搞了一个特斯拉 Q1 财报的分析文档,另一个朋友 说转成网页,我就说我试试。 我直接把他的文档和我最近探索出来的提示词就放到了 Chatwise 里面,以往我都是用 Claude 3.7 生成网页 的,这次默认是 Gemini 2.5 Pro,我也没看就按下了回车。 没想到生成的网页炒鸡惊艳,Gemini 的网页内容很多同时理解了提示词提到的设计风格,非常漂亮。 可以看图也可以在这里预览: https://kueaqan0fo.app.yourware.so/ | | | | | | $0.41 | 可比 -13% YoY | 同比 -16% YoY 网比 +154% YoY | | Acknowledged uncertainty, 94 update planned. Unusual admission of political/brand impact. | | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | FRITTY les a 2025 01 ...
不会 3D 也行?教你用AI创建非常酷炫的 NFT 3D 卡片动画
歸藏的AI工具箱· 2025-04-23 08:32
昨天在推特上刷到一个老哥用 4o 和可灵做的的品牌 3D 卡片看起来很惊艳。 所以昨天下午就尝试复刻和发散看看有什么其他玩法,没想到真的搞出来了,整个工作流还是很有价值的,而 且这套思路可以用在很多其他地方,这里教大家一下。 先看一下结果,我这里发散的是生成类似 NFT 的装备卡片,然后我们可以给这些装备卡片编一个背景故事搞 一个网站展示,让这些视频看起来更具价值一些。 我们先来看第一部分这部分的主要学习的部分是 json 的提示词写法,他把需要修改的部分都抽象成了变量 名,这样我们就不需要管核心部分的提示,只需要在改变的时候填写下面卡片内容的部分就行。 另外这种方法也可以快速创建很多风格一致的图片,比如我上面的三个卡片都是一样的风格。 这里我改了一下提示词,将整套提示词的主题变为了游戏装备这样的话整体做成视频之后表现力会好些,而且 更加契合。 用 Json 方式的提示词,这样方便产出一致性很高的图片,只需要改参数部分的文案就行 将图片用可灵 1.6 的首尾帧生成视频,首帧和尾帧一样的图片和提示词写法需要学习 最后是视频的展示部分,这部分是我加的,教大家用剪映让你的视频更出彩 把跟展示无关的部分改为了中文方便 ...
沉浸式翻译再发神器,PDF翻译终极解决方案,重要的是依然良心
歸藏的AI工具箱· 2025-04-23 08:32
沉浸式翻译我相信只要是 AI 圈子的人基本上人手一个,甚至你如果经常看海外内容不可能没有。 他可以用 AI 或者常规翻译生成整个网页的多语言对照翻译,还有连按三下空格将输入框的中文翻译为英文这 个神级技能。 除了体验很好之外还非常良心,免费提供几乎无限量的谷歌翻译额度,而且几乎适配了所有的模型 API,你可 以随意填写自己的。 然后前几天我发现他们发了个新功能: Babeldoc,支持在翻译 PDF 的时候保持文件的原始排版,而且还能 完整提取 PDF 内嵌的图表、脚注、公式等⾮⽂本元素。 刚开始我是不信的,过去这一两年相信大家都用过很多类似 PDF 翻译工具了,都知道这玩意想要翻译的同时 保持排版有多难。 我随手拿一个论文 PDF 试了一下,我去这玩意真的,整个 PDF 的排版真的一点都不带差的。 之后就用我们最近比较热的几个 PDF 试了试,真的很猛,各位可以看一下详细的测试。 另外翻译好的 谷歌提示词 PDF 和 HAI 2025 年人工智能报告的文件我也会放在文章后面 ,感兴趣的可以领 取。 先来点低难度的常见的论文,一般都不会有非常复杂的排版,难点主要在图表和表格以及公式上。 比如常见的论文开头部分 ...