歸藏的AI工具箱
Search documents
太猛了!谷歌悄悄在 Gemini 里塞了个 N8N 进去
歸藏的AI工具箱· 2025-12-19 09:28
前几天,去 Gemini 玩的时候发现谷歌的 Gem 功能,也就是类似 GPT 的 GPTs 功能更新了。 原来这个东西的能力非常之差,基本就是保存一段提示词然后给提示词起个名字的水平。 但是前几天发现他更新了,现在可以直 接帮你生成带有界面的网页应用 ,支持任 何图片或者文档的输入,也可以创建网页输出结果,而且可以调用所有的谷歌模型,功能相当强大。 比如我这里就用他创建了一个屏幕使用时间分析工具,上传你的屏幕使用时间他就会帮你创建 一个网页展示和分析你的屏幕使用时间状况。 具体包含一个非常漂亮的可视化海报,还有文本分析以及一个基于你屏幕使用时间建议的音频 博客。 进入到创建页面之后很简单的一个输入框,你直接告诉他想要做什么就可以了。 仔细探索了一下发现这个就是将前几天谷歌发布的类似 N8N 的 Agent 构建工具 Opal 内置到 了 Gemini 里面,而且更加易用了,今天大概教一下大家这个如何玩以及 Opal 的进阶操作。 首先是入口我们进到 Gemini 里面之后,在侧边栏这里找到"探索Gem"这个选项进去就行。 进去之后你就看到除了之前的 Gem 界面和设置之外,上面出现了一个全新 Gem 的实验 ...
字节 Seedance 1.5 Pro 藏师傅实测:可以说方言的音画同出视频模型
歸藏的AI工具箱· 2025-12-18 04:38
字节在前几天发布了 Seedance 1.5 Pro 视频生成模型。 4. 支持首尾帧生成视频,最长可单次生成 12 秒视频,还有 5 秒和 10 秒可选。 令人惊喜且独特的方言效果 音画同出可以直出方言这个太惊喜了,在我们国内的影视作品里面方言内容一直是为角色赋 予真实性和地方特色非常好用的一个手段,这下一些影视方面的应用要拓展了。 提示词: 重点是支持音画同出了,而且在本地化方面下了很多的功夫。 先看一下藏师傅的测试视频混剪: 结合官方介绍和我的测试结果说一下这次升级的主要内容: 1. 视频支持音画同步生成,支持多种主流方言,并且显著提升口型、语调对齐能力, 方言效果很好; 2. 增强语义理解,模型可以比较好的解析叙事语境,声音与画面同步情绪控制和专业 的表演能力提升非常大; 3. 精准且丰富的镜头控制,自主机位调度,长镜头、推轨变焦、希区柯克等都没问 题; 镜头怼着一个脸庞黝黑的老陕西人,他蹲在板凳上,手里捧着个脸盆大的青花瓷海碗,里面的面条被红彤彤 的油泼辣子裹得严严实实。他左手捏着一瓣蒜,咔嚓咬掉半截,右手猛地往嘴里猛吃了一大口面,发出震天响的吸溜 声。抬起头时嘴边全是红油,他冲着镜头一瞪眼,满脸陶 ...
Medeo 教程:一次生成无脑抽卡不可取,真正的视频 Agent 应该啥样
歸藏的AI工具箱· 2025-12-15 23:06
今年早些时候给大家介绍了 AI 视频生成 Agent Medeo 的 0.5 版本,当时他们已经算是这个品类的先行者了。 后来又有很多视频 Agent 发布,我也陆陆续续尝试了一些,但发现大部分的执行路径都非常死板,要不泛化性不强,要不完全无法通过 自然语言指挥模型进行修改和调整。 前几天拿到了 Medeo 的 1.0 版本,进步非常大,试了一下以后感觉相当惊艳, 文章后有邀请码抽奖 。 非常短的提示词可以出不错的效果这个是基本功,但是他们也可以支持非常灵活的通过 自然语言进行修改 ,支持 超过上千字的超长提示词,提供 非常好的泛化性 ,各种风格和垂类视频都可以做。 先来看一下我用他做的几个视频: 这是一个科普猎鹰九号助推器回收难度的视频,非常清晰企且直观的讲解了猎鹰九号火箭回收的意义和难度。 为我设计的 Vibe Coding 键盘做的宣传片,他可以很完美的还原任何产品,哪怕是全新设计的 将任何小说或者影视剧转换为哈基米宇宙的风格,这里是《诡秘之主》中克莱恩蜕变的那部分剧情 这些视频我都 总结了提示词,你们可以一键复刻 ,而且很通用,基本可以搞定一整个品类。 可以让优质创作者将自己的创作智能和创作逻辑压缩到 ...
Gemini 3+Nano Banana Pro+3D 生成+手势控制=?藏师傅教你炫酷展示运动成果
歸藏的AI工具箱· 2025-12-05 12:02
前几天继续玩 Nano Banana Pro 的时候,搞了一套将你的旅游景点和足迹放到罐子里的提示词非常漂亮。很多朋友也交作业了,提示词 在这里: 将你的旅行记忆放在罐子里|提示词 刚好藏师傅也是一个菜鸡户外运动爱好者,于是就想能不能帮户外运动爱好者做一套用 Nano Banana Pro 展示自己的运动成果的图片提 示词和海报。 这几天就一直在搞这个,没想到最后搞出来效果不错,先来看一下成果: 无论你是徒步、滑雪、骑行还是露营都有对应的提示词和展示方式,不止可以展示你的数据,还能展示装备和你所去位置的微缩模型和天 气,炫耀的同时保证隐私。 是不是很漂亮,可爱的同时又展示了自己的成绩和装备,非常适合跟打卡照片一起发。 你以为这就结束了吗,并没有。 这些微缩模型因为面数比较少,是不是非常适合转成 3D 呢,于是我将这些图片转成了 3D 模型,然后做了一个软件来展示,这样是不是 更加酷炫了。 然后你以为这就结束了?并没有。 前几天 Gemini 写的手势控制 3D 模型界面不是很火吗,藏师傅也整了一个,为这个产品加上手势控制,更加唬人了,手掌左滑停止旋 转、右滑继续旋转、捏手指缩小、张开手掌放大。 很好奇这些是怎 ...
视频进入可编辑时代:藏师傅教你视频版 Banana 可灵 O1
歸藏的AI工具箱· 2025-12-02 05:18
Core Viewpoint - The article introduces the launch of 可灵's O1, a unified video and image generation and editing tool that integrates multiple tasks into a single interface, allowing for seamless video and image editing and generation. Group 1: Features of O1 - O1 integrates multi-modal video models, combining reference videos, text-to-video, frame manipulation, content addition/removal, and style redrawing into a one-stop solution for generation and modification [2]. - It supports multi-modal inputs including images, videos, subjects, and text, enabling precise editing through natural language without the need for masks or keyframes [2][4]. - The tool maintains consistency in character, props, and scene features across shots through multi-angle subjects and reference materials, ensuring coherent visuals [2]. Group 2: Editing Capabilities - Users can generate narrative shots lasting approximately 3 to 10 seconds, allowing for flexible control over pacing and shot length [2]. - The editing process allows for direct modifications through text prompts, where users can upload videos and specify changes using references [4][6]. - O1 supports the use of single or multiple reference images for background or character modifications, enhancing the realism of the final output [7]. Group 3: Subject Creation and Consistency - O1 introduces a new element called "subject," which allows users to create and select characters for easier integration into videos without frequent uploads [10][13]. - Users can upload multiple images from different angles to improve consistency in character and scene representation during video generation [13][17]. - The tool is particularly beneficial for e-commerce, as it ensures that products remain consistent in appearance during various camera movements [17]. Group 4: Style and Frame Generation - O1 allows users to convert video styles easily, supporting various artistic styles such as felt, anime, and 8-bit pixel [19]. - The tool also supports frame generation, enabling users to create complex effects by combining image references with frame inputs [20][21]. - The overall capabilities of O1 in video editing are seen as a significant advancement, with the potential for creating impressive effects with minimal effort [29].
藏师傅用 Nano Banana Pro 帮你想去哪就去哪
歸藏的AI工具箱· 2025-11-25 12:59
Core Insights - The article discusses the capabilities of the newly released Nano Banana Pro, particularly its ability to generate location-specific images based on geographical coordinates [1][2]. - It highlights the integration of real-time data such as current time and weather conditions to enhance the realism of generated images [2][11]. - The article introduces various features of the product, including a "Travel Portrait" function that allows users to create personalized images at chosen locations [13][15]. Feature Overview - The Nano Banana Pro can generate images in two modes: Scenery mode for landscape photos and Travel Portrait mode for personalized images [8][13]. - Users can upload their own photos to create customized images that reflect the current weather and time at the selected location [15][18]. - The product includes a "Time Machine" feature that allows users to simulate images from different historical periods or alternate realities [20][21]. Additional Functionalities - The "Prank Mode" feature adds unexpected elements to the generated images, enhancing the fun aspect of the application [23]. - The article emphasizes the potential for creative combinations of prompts to yield unique and imaginative results [25]. - Users can quickly generate images using preset examples available on the platform [28]. Usage Instructions - The article provides guidance on accessing the product through various channels, including AI Studio, Poe, and Youware, each with different functionalities and requirements [30]. - Users can obtain geographical coordinates from Google Maps to create images that reflect specific locations and conditions [31].
Nano Banana Pro和顶级设计Agent Lovart会擦出怎样的火花?
歸藏的AI工具箱· 2025-11-22 12:50
Core Viewpoint - Google has launched the optimized Nano Banana Pro model based on Gemini 3, significantly enhancing its capabilities and addressing multilingual issues [2] Group 1: Lovart's Free Activity - Lovart is offering free access to Nano Banana Pro from November 21 to November 23, allowing all users to utilize the model without points for 365 days upon subscribing to Basic or higher membership [3] - Existing Basic and higher-level members will automatically receive the same 365-day unlimited access to Nano Banana Pro [3] Group 2: Usage Instructions - To avoid point deductions, users are advised to operate within the canvas, which allows direct model selection and image uploads without invoking other models [5] - Users can specify the model by using the "@" symbol followed by the model name in the input box [7] - Another method involves selecting the desired model from the model selection icon in the input area, streamlining the process [9] Group 3: Case Studies - A notable application involves combining anime characters with realistic scenes, creating visually striking images [11] - The process has been simplified to generate a realistic environment first and then add anime characters, avoiding the issue of the entire scene becoming anime-styled [15] - The model can generate images based on specific geographic coordinates, incorporating real-time weather and time information to enhance realism [19][20] Group 4: Enhanced PPT Generation - Lovart can generate PowerPoint presentations with greater flexibility compared to NotebookLM, allowing users to create entire sets of slides based on prompts [30] - Various styles for PPT generation have been outlined, including hand-drawn, minimalist, and themed designs, ensuring consistency across slides [36][41] - The model's ability to generate high-resolution images results in clearer text and fewer rendering issues compared to competitors [47] Group 5: Model and Agent Synergy - The integration of Lovart enhances the capabilities of the Nano Banana Pro model, improving batch generation, consistency, and the ability to leverage more features [48]
顶级邪修再战 Nano Banana Pro ,超多玩法,太猛了这玩意!
歸藏的AI工具箱· 2025-11-20 17:30
Core Insights - The article discusses the capabilities of the newly released Nano Banana Pro model, highlighting its advanced features in image generation and editing, particularly its support for real-time knowledge and reasoning, which significantly enhances its functionality [2][69]. Group 1: Model Capabilities - The Nano Banana Pro model has improved world knowledge and reasoning abilities, allowing it to generate accurate visual content based on real-time information [5][69]. - It can create detailed UI designs, such as a weather UI based on current weather data, showcasing its ability to integrate multiple elements and maintain consistency across images [9][11]. - The model supports multi-language capabilities, including strong performance in Chinese, enabling it to generate complex content with mixed languages without errors [14][15][17]. Group 2: Image Generation and Design - The model can generate high-quality collages and themed designs, maintaining the integrity of uploaded images while adding creative elements like handwritten notes and artistic fonts [20][22][24]. - It demonstrates strong consistency in product design, effectively transferring details from original images to new designs, which is crucial for e-commerce applications [27][29]. - The model's ability to adapt to various styles and themes is evident in its capacity to create modern and abstract designs, enhancing the overall aesthetic quality of generated images [57][60]. Group 3: User Applications and Accessibility - The Nano Banana Pro is integrated into various applications such as Lovart, Listenhub, and Flowith, making it widely accessible for users [67]. - Users can access a free version of the model through the Gemini app, although with limited resolution, while premium features are available for paid users [67][69]. - The rapid development and enhancement of the model within a few months reflect the company's commitment to innovation in AI-driven image generation [69].
慢一点、深一点|藏师傅带你看清 Gemini3 真实实力
歸藏的AI工具箱· 2025-11-19 08:04
Core Insights - The article discusses the performance of Gemini 3, highlighting its state-of-the-art (SOTA) capabilities across various benchmarks, significantly outperforming competitors in most categories [1][2]. Benchmark Performance - Gemini 3 Pro achieved the highest scores in several benchmarks, including: - 91.9% in GPQA Diamond for scientific knowledge [2] - 95.0% in AIME 2025 for mathematics without tools [2] - 100% in AIME 2025 with code execution [2] - 87.6% in Video-MMMU for knowledge acquisition from videos [2] - 2,439 Elo Rating in LiveCodeBench Pro for competitive coding [2] - In the ARC-AGI-2 visual reasoning puzzles, Gemini 3 scored 31.1%, significantly higher than its competitors [2]. Multimodal Understanding - The article emphasizes Gemini 3's strong multimodal understanding capabilities, particularly in analyzing video content and generating detailed summaries [6][8]. - It successfully analyzed a complex video, providing detailed insights into each scene and suggesting design tools for implementation [7][8]. Design and Coding Capabilities - Gemini 3 demonstrated advanced design capabilities by generating a complete design agent platform that can autonomously create images and videos based on user prompts [12][14]. - The AI was able to replicate complex design tasks, including logo design and packaging, showcasing its potential for practical applications in design [14][20]. Interactive Content Generation - The AI's ability to generate interactive content was highlighted, with examples of creating interactive games and visual novels based on user-provided scripts [34][36]. - This capability opens up new opportunities for content creation, allowing users to develop engaging narratives and gameplay experiences with minimal input [35]. Technical Implementation - The article provides detailed prompts for users to leverage Gemini 3's capabilities in web development, including creating a storytelling webpage and generating 3D voxel animations from images [26][44]. - The technical requirements emphasize the use of modern web technologies, ensuring that the generated content is visually appealing and functionally robust [28][43].
阿里“闪电战”再发力,这次是千问APP
歸藏的AI工具箱· 2025-11-17 04:04
Core Insights - Alibaba's influence in the AI sector is significant, being one of the few companies capable of competing with Google and OpenAI in both model variety and capability [1] - The recently released Qwen3-Max model demonstrates strong capabilities, ranking just below the leading models from major overseas competitors, while the open-source Qwen3-235B is the top open-source model on Lmarena [1] - Alibaba has developed a comprehensive suite of AI models, covering a wide range of applications including video generation, translation, image editing, and more, positioning itself as a formidable competitor in the AI landscape [4][7] Model Performance and Popularity - Qwen models dominate the download rankings on Huggingface, with over half of the top ten models being Qwen variants, indicating their popularity and acceptance in the community [2] - The Qwen3-Max model scored 1432 in evaluations, showcasing its competitive edge against other proprietary models [2] Application Features - The newly launched Qwen-based Qianwen app serves as a primary entry point for users, integrating various AI capabilities to perform common tasks effectively [8] - The app offers a user-friendly design, allowing users to trigger functions using natural language, making it accessible to a broader audience [10] - Key features include image recognition, real-time translation, and comprehensive health report analysis, demonstrating the app's versatility [20][24][25] User Experience and Accessibility - The Qianwen app provides free access to its features, including video generation with a daily limit of 15 uses, making it appealing to everyday users [12][43] - Users can generate detailed reports and summaries from complex documents, enhancing the app's utility for personal and professional use [30][31] Community and Ecosystem Integration - Alibaba's ecosystem, including platforms like Taobao and DingTalk, enhances the potential for the Qwen models to be integrated into various applications, expanding their reach and functionality [8] - The app's design and functionality are tailored to meet user needs, with a focus on clarity and ease of use, which is crucial for attracting non-technical users [49]