Workflow
歸藏的AI工具箱
icon
Search documents
独一份!带动效的 PPT 生成 Agent!使用教学&创作思路
歸藏的AI工具箱· 2026-01-13 07:28
Core Viewpoint - The article discusses the development of a new skill for generating PowerPoint presentations with enhanced features, including animated transitions and video exports, using AI technology. Group 1: Skill Features - The updated PPT generation skill now prompts users to choose whether to include video transitions, resulting in both image and video presentations being exported [5][9] - The video presentation includes a webpage designed for easy playback, featuring a dynamic cover page that loops to capture audience attention [6][7] - The skill automatically saves all generated PPT images and provides a webpage for presentation control [27] Group 2: Installation and Setup - The project is open-sourced, with a detailed installation guide provided for users to set up the skill in CLI tools like Claude Code or OpenCode [13][15] - Users need to prepare API keys from Google and Kling to utilize the image generation and video transition features, with specific instructions on obtaining these keys [18][19] - The installation process involves creating a skill directory, cloning the project, installing dependencies, and configuring API keys [22] Group 3: Workflow and Process - The workflow for generating presentations involves analyzing user input documents, generating images, and creating video transitions using specified APIs [34][35] - A meta-prompt is designed to generate specific prompts based on the images created, which is expected to add significant value in future applications [36] - The article emphasizes the complexity of the FFmpeg video composition process, which integrates images and videos into a cohesive presentation [38] Group 4: Insights and Future Implications - The development of this skill reflects a significant advancement in AI capabilities, suggesting that AI coding is reaching a critical point where it can self-direct and replicate functionalities [41] - The author notes that the cost of developing this skill was approximately $20, highlighting the affordability of creating advanced AI-driven tools [40] - The article concludes with a reflection on the potential future significance of these developments in AI technology [42]
简单快速的用 Claude Code 帮你创建 PPT 生成 Skills
歸藏的AI工具箱· 2026-01-09 08:16
Core Insights - The article discusses the rising popularity of Claude Code, emphasizing its powerful programming capabilities and the ability to create simple agents using AI, which enhances its functionality significantly [2] Installation Process - Users must have Claude Code installed and a paid Google API to utilize the Skills feature [3] - Installation can be simplified by using provided prompts that allow Claude Code or similar coding agents to automate the installation process [5][6] - The article provides a detailed step-by-step guide for installing the PPT Generator Skill, including cloning the project from GitHub, creating a Python virtual environment, installing dependencies, and configuring system environment variables [8][9] Usage of Skills - After installation, users can generate PPTs by placing text files in a designated folder and instructing Claude Code to create a PPT based on the document [11] - The Skills include two built-in themes for PPT generation, allowing users to choose their preferred style during the process [12] Creating Skills - The article outlines the importance of clearly defining the purpose of the Skills before creation, suggesting that users prepare necessary materials such as style prompts and API documentation [17][18] - It recommends using Markdown files for better compatibility with AI models and emphasizes the need for thorough preparation to avoid errors during the generation process [20] - Users can consult Claude Code for assistance in outlining requirements if they are unsure about what to prepare [21] Iteration and Version Control - Once a Skill is created, users are encouraged to utilize Git for version control to manage changes and iterations effectively [27] - The article explains how to create a Git repository and submit Skills to it for version management [28] Limitations and Flexibility - The Skills have limitations, particularly with complex functionalities, and it is advised to package different tools into separate Skills to reduce error probability and enhance flexibility [36]
从大厂设计师到超级一人公司:6000字回顾我和AI的2025
歸藏的AI工具箱· 2025-12-30 10:34
Core Insights - The article reflects on significant changes and developments in the AI industry and personal career transitions over the past year, highlighting the importance of adapting to new technologies and platforms [2][3]. Group 1: Personal Career Changes - The author transitioned from a designer at a large company to a freelancer, focusing on leveraging AI to create a sustainable one-person business that benefits industry peers [4]. - The shift in focus from self-judgment based on data to long-term interests and skills has led to a more relaxed yet productive work rhythm [4]. Group 2: Social Media and Content Creation - The author does not identify as a traditional content creator, which has helped avoid data anxiety and internal conflict, although it has also led to slower adaptation to platform changes [5][6]. - Twitter and Jike have been primary platforms for engagement, with the author achieving a significant following of nearly 25,000 on Jike and 110,000 on Twitter, emphasizing the importance of interaction with international users [12][10]. - The author has started producing videos, which have performed well on platforms like Douyin and Xiaohongshu, indicating a shift towards video content as a necessary adaptation in the AI landscape [17][19]. Group 3: AI Community and Networking - The author has developed a paid community to support the AIGC Weekly, which has proven effective in fostering collaboration and sharing among members [21][30]. - A recent promotional event for the community attracted around 2,000 paid members, showcasing the potential for community-driven marketing strategies [28]. Group 4: AI Product Development - The article discusses the rise of Vibe Coding and Agent tools, highlighting their significance in the AI programming landscape and the author's contributions to tutorials and community knowledge sharing [38][34]. - The author has engaged with various AI product teams, gaining insights that enhance understanding of industry trends and product development [43]. Group 5: Future Trends in AI - The article anticipates key technological breakthroughs in AI, particularly in reinforcement learning and multi-modal capabilities, which are expected to drive significant advancements in the coming years [52][55]. - The emergence of products like Chatwise and Manus is noted for their potential to redefine user interaction with AI, indicating a shift towards more integrated and user-friendly AI solutions [58][60].
太猛了!谷歌悄悄在 Gemini 里塞了个 N8N 进去
歸藏的AI工具箱· 2025-12-19 09:28
Core Insights - Google has updated its Gemini platform, enhancing its capabilities to generate web applications with interfaces, supporting various inputs like images and documents, and utilizing all Google models, making it significantly more powerful than before [2][6]. Group 1: New Features and Functionalities - The updated Gemini allows users to create web applications that can analyze data, such as screen time usage, and present it in a visually appealing format, including text analysis and audio blogs [4][19]. - The integration of Opal, a tool similar to N8N, into Gemini simplifies the process of building applications, making it more user-friendly [6][21]. - Users can easily create new Gems by navigating to the "Explore Gem" section and using a straightforward input box to specify their desired application [7][12]. Group 2: Data Analysis and Visualization - The platform supports a wide range of file formats for input, including CSV files, YouTube videos, and even allows for recording web operations and doodles [15][19]. - Detailed analysis results from uploaded training data include visual dashboards, tables, and personalized training suggestions, which can be modified to different languages as needed [17][19]. - The analysis provides insights into training trends, highlighting improvements and declines in various exercises, and offers actionable recommendations for optimizing workouts [19][20]. Group 3: Advanced Editing and Customization - Users can access an advanced editor to fine-tune their applications, allowing for detailed adjustments to data processing steps and model selections [23][24]. - The editor features a card-based interface where users can add models, preview applications, and modify prompts for better results [23][26]. - Specific models for text, audio, video, and image processing are available, enabling users to customize their applications according to their needs [26][27]. Group 4: Sharing and Collaboration - The platform includes a sharing feature that allows users to generate links to their applications, enabling others to access and modify them based on their Google account permissions [36][38]. - The integration of various AI products into Gemini indicates a significant consolidation of Google's AI capabilities, enhancing the overall user experience and functionality [38].
字节 Seedance 1.5 Pro 藏师傅实测:可以说方言的音画同出视频模型
歸藏的AI工具箱· 2025-12-18 04:38
Core Viewpoint - ByteDance has released the Seedance 1.5 Pro video generation model, which significantly enhances audio-visual synchronization and local dialect support, improving the realism and emotional expression in generated videos [1][36]. Group 1: Key Features of Seedance 1.5 Pro - The model supports audio-visual synchronization generation, with improved lip-sync and tone alignment capabilities, particularly effective for various dialects [3][4]. - Enhanced semantic understanding allows the model to better interpret narrative contexts, improving emotional control and professional performance [3][12]. - The model offers precise and rich camera control, enabling complex shots such as long takes and zooms [3][26]. - It can generate videos of varying lengths, with a maximum of 12 seconds in a single output [3]. Group 2: Dialect and Cultural Relevance - The ability to generate dialect content is crucial for adding authenticity and regional characteristics to characters in film and television [5][12]. - The model has shown impressive results in generating dialects like Shaanxi and Sichuan, maintaining the unique phonetic qualities and emotional tones [7][9][11]. Group 3: Emotional and Performance Capabilities - The model demonstrates strong emotional expression, effectively conveying complex feelings such as fear and desperation through facial expressions and voice modulation [20][21]. - It can generate realistic animal sounds and expressions, enhancing the appeal of pet-related content [15][17]. Group 4: Technical Advancements - The model has improved its ability to handle complex camera movements, including advanced techniques like the Hitchcock zoom, achieving smooth transitions and maintaining visual consistency [29][30][32]. - The integration of audio capabilities with high-quality text-to-video generation has significantly reduced the complexity of video production [36][37]. Group 5: Market Implications - The advancements in Seedance 1.5 Pro are expected to lead to a surge in video generation products and video agent applications, making it easier for users to create high-quality content [37].
Medeo 教程:一次生成无脑抽卡不可取,真正的视频 Agent 应该啥样
歸藏的AI工具箱· 2025-12-15 23:06
Core Insights - The article introduces the significant advancements of Medeo's 1.0 version, highlighting its flexibility and improved capabilities in AI video generation, making it a leader in its category [1][58][62]. Group 1: Medeo's Features - Medeo 1.0 supports natural language modifications, allowing users to input concise prompts and generate high-quality videos across various styles and categories [1][4]. - The platform offers a user-friendly interface with templates that include visual styles, scripts, editing methods, and music, making it accessible even for beginners [5][6]. - Users can customize video formats, lengths, and styles, and upload materials directly from URLs or personal files [6][8]. Group 2: Video Creation Process - The video creation process is initiated by simply describing the desired output, with Medeo capable of understanding and executing modifications based on user feedback [7][8]. - Medeo utilizes a context system to match user instructions with relevant video production contexts, enhancing the overall editing experience [62][65]. - The platform can intelligently decide when to use different models for image and video generation, optimizing the production process [10][62]. Group 3: Use Cases and Examples - The article showcases various video examples created using Medeo, including educational content about the Falcon 9 rocket and promotional videos for unique products [2][3][32]. - Specific prompts and templates are provided for creating videos in different styles, such as miniature model aesthetics and lifestyle product advertisements [25][40]. - The article emphasizes the collaborative nature of prompt creation between users and Medeo, allowing for iterative improvements and refinements [47][56]. Group 4: Future Prospects - Medeo is currently in beta testing and is expected to launch fully soon, with a large number of activation codes available for users [68][70]. - The article encourages users to engage with the platform and share their creations, indicating a community-driven approach to content generation [70][71].
Gemini 3+Nano Banana Pro+3D 生成+手势控制=?藏师傅教你炫酷展示运动成果
歸藏的AI工具箱· 2025-12-05 12:02
Core Viewpoint - The article discusses the creation of personalized 3D models and posters for outdoor activities such as hiking, skiing, cycling, and camping, utilizing the Nano Banana Pro tool to showcase achievements while maintaining privacy [4][6][8]. Group 1: Skiing - The skiing poster design involves creating a visual representation of ski tracks on a snow-covered mountain, integrating user-uploaded images of ski equipment to enhance the visual appeal [10][11]. - The atmosphere is emphasized with strong reflections and a snowy forest backdrop, creating a dynamic and engaging scene [11][12]. - The final output includes a title, data from uploaded images, and a short phrase related to the skiing experience [13]. Group 2: Cycling - The cycling poster design focuses on a 3D terrain model featuring a prominent local landmark, with a clear road path illustrating the cycling route [16][17]. - User-uploaded images of bicycles are incorporated into the design, ensuring accurate representation of colors and features [16]. - The visual style includes a shallow depth of field and morning light effects, enhancing the overall aesthetic [17][18]. Group 3: Hiking - The hiking poster design highlights a local landmark with a winding path, integrating user-uploaded images of hiking gear to symbolize the hiking experience [21][22]. - The atmosphere is crafted with a dreamlike quality, featuring elements like mist and reflections on water surfaces [21]. - The final design includes a title, data from uploaded images, and specific geographic coordinates [23]. Group 4: Camping - The camping poster design showcases a local landscape with a focus on the camping setup, using user-uploaded images of tents and camping gear [25][26]. - The scene is set in a night mode with warm lighting effects emanating from the tent, creating a cozy atmosphere [26][27]. - The final output includes a title, data on elevation, temperature, and camping duration, along with a poetic phrase about the camping experience [28]. Group 5: 3D Model Creation - The article explains the process of converting images into 3D models using tools like tripo3d.ai or hyper3d.ai, emphasizing the simplicity of the operation [31][33]. - Users are instructed to download the generated models in GLB format for compatibility [33]. - The final step involves uploading the 3D model and associated data to a platform for interactive display, including gesture control features [36][38]. Group 6: Product Development - The article outlines the straightforward process of building a webpage to showcase 3D models and data visualizations, highlighting the ease of use of the Gemini 3 Pro tool [40][41]. - The design aims for a clean, minimalistic aesthetic while incorporating interactive elements for user engagement [41]. - The article encourages sharing experiences and creations within the outdoor community [42][43].
视频进入可编辑时代:藏师傅教你视频版 Banana 可灵 O1
歸藏的AI工具箱· 2025-12-02 05:18
Core Viewpoint - The article introduces the launch of 可灵's O1, a unified video and image generation and editing tool that integrates multiple tasks into a single interface, allowing for seamless video and image editing and generation. Group 1: Features of O1 - O1 integrates multi-modal video models, combining reference videos, text-to-video, frame manipulation, content addition/removal, and style redrawing into a one-stop solution for generation and modification [2]. - It supports multi-modal inputs including images, videos, subjects, and text, enabling precise editing through natural language without the need for masks or keyframes [2][4]. - The tool maintains consistency in character, props, and scene features across shots through multi-angle subjects and reference materials, ensuring coherent visuals [2]. Group 2: Editing Capabilities - Users can generate narrative shots lasting approximately 3 to 10 seconds, allowing for flexible control over pacing and shot length [2]. - The editing process allows for direct modifications through text prompts, where users can upload videos and specify changes using references [4][6]. - O1 supports the use of single or multiple reference images for background or character modifications, enhancing the realism of the final output [7]. Group 3: Subject Creation and Consistency - O1 introduces a new element called "subject," which allows users to create and select characters for easier integration into videos without frequent uploads [10][13]. - Users can upload multiple images from different angles to improve consistency in character and scene representation during video generation [13][17]. - The tool is particularly beneficial for e-commerce, as it ensures that products remain consistent in appearance during various camera movements [17]. Group 4: Style and Frame Generation - O1 allows users to convert video styles easily, supporting various artistic styles such as felt, anime, and 8-bit pixel [19]. - The tool also supports frame generation, enabling users to create complex effects by combining image references with frame inputs [20][21]. - The overall capabilities of O1 in video editing are seen as a significant advancement, with the potential for creating impressive effects with minimal effort [29].
藏师傅用 Nano Banana Pro 帮你想去哪就去哪
歸藏的AI工具箱· 2025-11-25 12:59
Core Insights - The article discusses the capabilities of the newly released Nano Banana Pro, particularly its ability to generate location-specific images based on geographical coordinates [1][2]. - It highlights the integration of real-time data such as current time and weather conditions to enhance the realism of generated images [2][11]. - The article introduces various features of the product, including a "Travel Portrait" function that allows users to create personalized images at chosen locations [13][15]. Feature Overview - The Nano Banana Pro can generate images in two modes: Scenery mode for landscape photos and Travel Portrait mode for personalized images [8][13]. - Users can upload their own photos to create customized images that reflect the current weather and time at the selected location [15][18]. - The product includes a "Time Machine" feature that allows users to simulate images from different historical periods or alternate realities [20][21]. Additional Functionalities - The "Prank Mode" feature adds unexpected elements to the generated images, enhancing the fun aspect of the application [23]. - The article emphasizes the potential for creative combinations of prompts to yield unique and imaginative results [25]. - Users can quickly generate images using preset examples available on the platform [28]. Usage Instructions - The article provides guidance on accessing the product through various channels, including AI Studio, Poe, and Youware, each with different functionalities and requirements [30]. - Users can obtain geographical coordinates from Google Maps to create images that reflect specific locations and conditions [31].
Nano Banana Pro和顶级设计Agent Lovart会擦出怎样的火花?
歸藏的AI工具箱· 2025-11-22 12:50
Core Viewpoint - Google has launched the optimized Nano Banana Pro model based on Gemini 3, significantly enhancing its capabilities and addressing multilingual issues [2] Group 1: Lovart's Free Activity - Lovart is offering free access to Nano Banana Pro from November 21 to November 23, allowing all users to utilize the model without points for 365 days upon subscribing to Basic or higher membership [3] - Existing Basic and higher-level members will automatically receive the same 365-day unlimited access to Nano Banana Pro [3] Group 2: Usage Instructions - To avoid point deductions, users are advised to operate within the canvas, which allows direct model selection and image uploads without invoking other models [5] - Users can specify the model by using the "@" symbol followed by the model name in the input box [7] - Another method involves selecting the desired model from the model selection icon in the input area, streamlining the process [9] Group 3: Case Studies - A notable application involves combining anime characters with realistic scenes, creating visually striking images [11] - The process has been simplified to generate a realistic environment first and then add anime characters, avoiding the issue of the entire scene becoming anime-styled [15] - The model can generate images based on specific geographic coordinates, incorporating real-time weather and time information to enhance realism [19][20] Group 4: Enhanced PPT Generation - Lovart can generate PowerPoint presentations with greater flexibility compared to NotebookLM, allowing users to create entire sets of slides based on prompts [30] - Various styles for PPT generation have been outlined, including hand-drawn, minimalist, and themed designs, ensuring consistency across slides [36][41] - The model's ability to generate high-resolution images results in clearer text and fewer rendering issues compared to competitors [47] Group 5: Model and Agent Synergy - The integration of Lovart enhances the capabilities of the Nano Banana Pro model, improving batch generation, consistency, and the ability to leverage more features [48]