Workflow
歸藏的AI工具箱
icon
Search documents
AI编码新神登基,藏师傅一手Claude 4实测
歸藏的AI工具箱· 2025-05-22 18:00
Claude 4 就这么低调的发布了,之前他们 CEO 说27年所有的代码都会由AI生成,现在看来应该就是看到了 Claude 4的潜力。 根据 Anthropic 所说 Claude Opus 4 是全球最佳编码模型,在复杂、长期运行的任务和代理工作流中表现持 续优异。 基础介绍 还有一些其他的发布内容,包括: 最重要的定价: Claude Sonnet 4 会向免费用户开放,这太好了。 API上定价与之前的 Opus 和 Sonnet 模型保持一致:Opus 4 每百万 token 输入/输出价格为 15/75 美元, Sonnet 4 为 3/15 美元。 模型能力 Claude Opus 4 的编码能力在 SWE-bench(72.5%)和 Terminal-bench(43.2%)上大幅领先其他模型, 而且它在需要集中精力和数千步操作的长时间任务中表现出持续稳定的性能,能够连续工作数小时,这个对于 Agent产品非常重要。 扩展思维与工具使用(测试版):两款模型在扩展思维过程中均可使用工具。 新模型能力:两款模型均可并行使用工具,更精准地遵循指令,并且在开发者授予本地文件访问权限时, 展现出显著增强 ...
我用这个产品做了小米5.22发布会官网,同事:这不是官方做的?
歸藏的AI工具箱· 2025-05-22 09:24
前几天受邀参加了天工超级智能体(Skywork Super Agents)的提前测试。 试了一下我发现,相较于各种大包大揽的所谓通用智能体,天工非常的务实,专注于帮助打工人优化我们每天 接触最多也是最繁琐的三个交付物,也就是所谓的 Office 三件套,文档、表格、PPT。 天工超级智能体 不是简单的生成一个交付物就结束了,而是考虑到了整个内容的生命周期 ,从意图判断到内 容检索到高品质生成到编辑和修改都做了非常多的优化,最大限度的保证内容的可用性。 先介绍一下天工超级智能体的主要能力: 网页生成 我发现他们有网页生成模式,那是时候掏出藏师傅的老测试项目了。 今晚不是有小米发布会吗,我想了一个很好的测试方式, 直接让他给小米做一个发布会预热网页 。 这个除了考验对藏师傅网页生成提示词的还原以外,也非常考验对于最新信息的检索能力,因为很多都是预测 信息,而且都是最近几天发布的,我们很容易就能看到检索的质量。 我也根据小米的设计风格改了一下网页生成提示词,大家有类似场景可以直接用。 这里可以看案例回放: https://www.skywork.ai/share/project/192542753810075238 ...
CEO的智囊团,实习生的救命稻草:这个飞书功能如何让所有人都变高效
歸藏的AI工具箱· 2025-05-21 07:18
Core Viewpoint - Feishu's Knowledge Q&A feature significantly enhances workplace efficiency by providing tailored AI responses based on organizational data and internet knowledge, proving to be a valuable tool for employees at all levels [1][2][22]. Group 1: Product Overview - Feishu Knowledge Q&A is a proprietary AI tool designed for enterprises, allowing users to ask questions and receive answers based on accessible organizational data, documents, and internet knowledge [2]. - The tool aids in content creation and enhances business understanding, making it versatile for various tasks [3]. Group 2: Practical Applications - The feature allows users to quickly gather information about ongoing projects, reducing the time spent sifting through numerous documents [4]. - Users can perform targeted inquiries to understand specific aspects of their responsibilities, such as categorizing guest speakers and their topics for events [5]. - It can retrieve not only text but also relevant images, aiding in event preparation [7]. - The AI can provide comprehensive suggestions for event planning, covering aspects from venue selection to promotional strategies [9]. - It can generate progress report documents based on user queries, significantly reducing the time required for such tasks [12]. - The tool is particularly beneficial for middle and upper management, enabling them to access real-time data and updates without waiting for subordinate reports [17]. Group 3: Personal Development Support - For individual users, Feishu Knowledge Q&A serves as a powerful AI knowledge base, helping to organize and optimize personal content and writing tasks [18][19]. - The tool can efficiently retrieve and analyze existing documents, providing structured insights and suggestions for improvement [19]. - It allows users to search specific knowledge bases for relevant information, streamlining the process of finding and organizing content [21]. Group 4: Competitive Advantage - The effectiveness of Feishu Knowledge Q&A lies in its ability to leverage contextual information from organizational documents, which enhances the AI's understanding and response accuracy [22]. - The integration of rich organizational context is seen as a key differentiator compared to other AI products, making it a cost-effective solution for enterprise AI implementation [22].
Veo3和FLOW一手实测:谷歌这次成了,这次视频创作可能彻底变天
歸藏的AI工具箱· 2025-05-21 07:18
Core Viewpoint - Google's new video model Veo3 and AI video creation product FLOW represent a significant advancement in video generation technology, enhancing usability and application scenarios for video editing and digital content creation [1][29]. Group 1: Features of Veo3 and FLOW - Veo3 can generate videos with corresponding ambient sounds and synchronized speech, greatly improving the usability for video editing software and digital avatars [2][29]. - FLOW allows for the generation of both images and videos, supports video extension and trimming, and enables users to compile selected clips into a complete video [2][15]. Group 2: Testing and Applications - Testing of Veo3 demonstrated accurate lip-syncing and sound effects, even with complex animations, showcasing its potential for various applications [4][6]. - The model can generate diverse scenes, such as a character explaining gravity under an apple tree, indicating its capability for educational content [7]. - Veo3 can also create ASMR videos by generating realistic environmental sounds, expanding its application in content creation [8][9]. Group 3: FLOW Usage Tutorial - FLOW provides a user-friendly interface for creating projects, where users can input prompts to generate videos [15][16]. - The platform supports three main video generation methods: text-to-video, image-to-video, and material-to-video, although it currently does not allow for external image uploads [20]. - Users can edit and arrange scenes, with the ability to download videos in high definition, although sound may require specific steps to be included [21][26]. Group 4: Conclusion and Future Implications - The integration of sound generation, speech synthesis, and lip-syncing in Veo3 marks a significant upgrade in video modeling, similar to the advancements seen with the release of the 4o image model [29]. - The potential for new applications and products in various industries is vast, as demonstrated by the capabilities of Veo3 and FLOW [29].
这宣传图也太上流了!藏师傅教大家如何用4o搭配提示词生成
歸藏的AI工具箱· 2025-05-19 08:58
Core Viewpoint - The article discusses the process of creating visually appealing promotional graphics using AI-generated icons, specifically focusing on the integration of a new product, ListenHub, with design elements inspired by Airbnb's iconography [1][2]. Group 1: Icon Generation - The process begins with generating icons that align with the promotional content of the product, utilizing AI tools to analyze the article and suggest appropriate icons [2]. - A specific example is provided where nine icons were created for ListenHub, reflecting the product's features [2]. Group 2: Design Style - The design style is influenced by Airbnb's new icon aesthetics, emphasizing realistic textures and soft lighting, with a focus on materials like wood, metal, and plastic [5][7]. - The icons are designed to be minimalistic yet charming, with a premium feel, resembling early iOS design principles [11]. Group 3: Webpage Creation - The article outlines the steps to upload images and generate a webpage using Markdown links, making the process user-friendly [16]. - Specific design requirements for the webpage are detailed, including color schemes, layout, and responsive design for larger displays [17][18]. Group 4: Final Touches - The article concludes with suggestions for optimizing the generated webpage using tools like Figma, enhancing the overall presentation [19].
不会剪辑?一句话生成完整可编辑的视频:Medeo 带你看视频生成的未来
歸藏的AI工具箱· 2025-05-16 08:11
过去一年不断有人问我,"藏师傅有没有通过一个提示词生成整段视频的产品啊,我愿意付费"或者是"藏师 傅,我这里有口播稿和素材有没有能帮我剪辑的 AI 产品"。 我跟他们说的都是应该快了,马上就会有的,这次终于有了! Medeo( https://ai.medeo.app/create ):创作者的专属AI视频工作室。 无论你有多少素材,哪怕只有一句话,他都能帮你生成一个带口播、音乐的完整视频。 这篇内容我会用几个案例来展示这个产品有多强大,另外会介绍一些使用技巧。 先来看一些案例 最基础的能力是你提供素材或者口播稿,他会帮你完成剪辑并生成视频。 非常适合资讯类或者对内容控制要求高的需求。 而且你可以要求他严格按照你提供的口播稿生成视频,也可以提供信息之后让他自己发挥。 比如下面这个左边就是我提供了 Dia CEO 的发言之后让他自己发挥的,右边就是让他精准根据口播稿生成的 视频。 我还提供了一些 Dia 的截图和视频,如果不够的话他还会自己寻找素材匹配进去,整个成本非常低。 当别的信息搬运者还在复制文字的时候,你直接一个链接丢进去,已经出视频了。 下面这个科普视频,我整个提示词就只有这一段话,没有任何干预,所有 ...
Speech-02语音模型登顶国际榜单:完美复刻声音,同事听后难辨真伪
歸藏的AI工具箱· 2025-05-15 09:14
藏师傅很多时候早上上班和洗澡之前都是听音乐的,虽然我喜欢看小说, 但之前 对那种 AI 生成的音频听书 嗤之以鼻。 但是那天无意间用了一下起点新的听书功能 , 发现居然都这么牛了,前几天交流发现他们的语音生成服务居 然是用的 MiniMax 的 Speech 模型,而且就是我最喜欢那个「说书先生」的角色。 最近发现他们更新 的 Speech-02 音频模型, 在Artifici al Analysis 的 ELO 评价榜单上吊打 Open AI 和 Ele venLabs 一众海外音频模型,基本上霸榜了。 Hugging Face上,不出意外,也是第一名的成绩。 这次 Speech-02 最大的创新在于引入了可学习的说话人编码器,它能 从参考音频中提取音色特征,无需音 频转录 。基于这个就可以实现很多能力,比如 只需要一段 十几秒的 语言就能实现高质量的声音参考能力 ; 因为说话人编码器捕捉的是与语言无关的音色特征, 还能实现将音色迁移到别的语言上 ,这个对于内容出海 很有帮助; Speech-02还 带来了 非常高的可扩展性 ,音色可以用在情感控制、文本到音色、专业语音参考等下游任 务,不需要更换模型。 另 ...
不看这藏师傅篇深度拆解,你永远不知道Lovart AI有多可怕(有邀请码)
歸藏的AI工具箱· 2025-05-13 08:42
估计今天都被 Lovart AI 刷屏了,前段时间也受邀参加了测试,先看图了解一下产品基本能力。 当时看到 GPT-4o 的图片的时候,我就知道通用的设计 Agent产品出现的条件已经成熟,没想到是他们先掏了出来。 而且效果还这么好, 整个产品很好的兼顾了设计小白和专业用户,把设计门槛拉低到了令人发指的地步 。 无论你是个体户老板、营销人员、设计师、电商设计,只要你会打字,能够描述你的大致要求,哪怕只会打几个字你也可以获得高水准的设计产出。 流程深度拆解 先来看一个案例,这是个香水的视频广告,完成度非常高,,而我的提示词只有几十个字,而且从提示词来看我也没有指望他直接完成视频的制作,我写的都是为后 面做做视频做准备。 提示词:帮我为这个产品生成一个 30 秒广告需要的所有分镜图片,后续我会基于这些图片指导拍摄和生成视频 很多朋友可能对 AI 能全自动做出这个产品的难度没有概念。 我来一步一步深度拆解一下Lovart 的 Agent 是如何做的,同时看一下我的心路历程,后面我都傻了。 一般来说你肯定以为他要直接生成提示词开始画图了,但并没有,Lovart 进行了非常多的分析,比一些设计师都专业。 首先他根据我上 ...
AI也需要"记笔记":Karpathy从Claude 1.6万字提示词中看到的未来
歸藏的AI工具箱· 2025-05-12 08:28
Core Viewpoint - The article discusses the significance of system prompts in large language models (LLMs), particularly focusing on Claude's extensive system prompt and the potential for a new learning paradigm termed "system prompt learning" proposed by Karpathy [6][12]. Group 1: System Prompts Overview - Claude's system prompt consists of 16,739 words, significantly longer than OpenAI's ChatGPT o4-mini, which has only 2,218 words, representing just 13% of Claude's prompt [2][3]. - System prompts serve as an initial instruction manual for LLMs, guiding their roles, rules, and response styles [4]. - The content of Claude's system prompt includes tool definitions, user preferences, and guidelines for various tasks, indicating a structured approach to AI interactions [8]. Group 2: Current Learning Paradigms - The existing learning paradigms for LLMs include pretraining, which provides broad knowledge through large datasets, and finetuning, which adjusts model behavior through parameter updates [9]. - Unlike LLMs, humans often learn by summarizing experiences and strategies, akin to "note-taking," rather than solely relying on parameter updates [10]. Group 3: System Prompt Learning - Karpathy suggests that LLMs should adopt a "system prompt learning" mechanism, allowing them to store strategies and knowledge in an explicit format, enhancing efficiency and scalability [10][12]. - This new learning paradigm could lead to more effective data utilization and improved generalization capabilities for LLMs [19]. Group 4: Practical Implications - Clear and detailed instructions in system prompts lead to more accurate AI responses, emphasizing the importance of structured communication [13][14]. - The article highlights that "prompt engineering" is an extension of everyday communication skills, making it accessible for ordinary users [16].
生成网页可以垫视频了?教你用 Gemini 2.5 最强大的能力
歸藏的AI工具箱· 2025-05-09 08:34
Core Viewpoint - The article highlights the advanced capabilities of Gemini 2.5 Pro 0506, particularly its ability to generate high-fidelity web effects from uploaded interactive videos, showcasing significant improvements in front-end development and user interface design [1][4]. Group 1: Version Overview - Gemini 2.5 Pro 0506 was released on May 6, 2023, in preparation for the Google I/O conference [4]. - The main updates include substantial enhancements in front-end and user interface development, as well as improvements in basic coding tasks such as code conversion and editing [4]. Group 2: Testing and Capabilities - Initial tests demonstrated that Gemini can create interactive web pages from videos, leveraging its strong video multimodal understanding capabilities [5][6]. - Further tests revealed that while Gemini performs well in generating interactive animations, it may overlook some finer details, such as color changes and spacing [7][8]. Group 3: Usage Guidelines - A template for effective prompts was provided, emphasizing the need to describe key animation effects and details that Gemini might miss due to its limitations [10][11]. - Users are advised to upload videos to AI Studio for optimal results, ensuring videos are compressed and not too lengthy to maintain context [13]. Group 4: Conclusion and Community Engagement - The article concludes by encouraging users to explore the potential of Gemini's capabilities beyond simple animations and invites community discussion for further innovative applications [14].