歸藏的AI工具箱
Search documents
太猛了!谷歌悄悄在 Gemini 里塞了个 N8N 进去
歸藏的AI工具箱· 2025-12-19 09:28
Core Insights - Google has updated its Gemini platform, enhancing its capabilities to generate web applications with interfaces, supporting various inputs like images and documents, and utilizing all Google models, making it significantly more powerful than before [2][6]. Group 1: New Features and Functionalities - The updated Gemini allows users to create web applications that can analyze data, such as screen time usage, and present it in a visually appealing format, including text analysis and audio blogs [4][19]. - The integration of Opal, a tool similar to N8N, into Gemini simplifies the process of building applications, making it more user-friendly [6][21]. - Users can easily create new Gems by navigating to the "Explore Gem" section and using a straightforward input box to specify their desired application [7][12]. Group 2: Data Analysis and Visualization - The platform supports a wide range of file formats for input, including CSV files, YouTube videos, and even allows for recording web operations and doodles [15][19]. - Detailed analysis results from uploaded training data include visual dashboards, tables, and personalized training suggestions, which can be modified to different languages as needed [17][19]. - The analysis provides insights into training trends, highlighting improvements and declines in various exercises, and offers actionable recommendations for optimizing workouts [19][20]. Group 3: Advanced Editing and Customization - Users can access an advanced editor to fine-tune their applications, allowing for detailed adjustments to data processing steps and model selections [23][24]. - The editor features a card-based interface where users can add models, preview applications, and modify prompts for better results [23][26]. - Specific models for text, audio, video, and image processing are available, enabling users to customize their applications according to their needs [26][27]. Group 4: Sharing and Collaboration - The platform includes a sharing feature that allows users to generate links to their applications, enabling others to access and modify them based on their Google account permissions [36][38]. - The integration of various AI products into Gemini indicates a significant consolidation of Google's AI capabilities, enhancing the overall user experience and functionality [38].
字节 Seedance 1.5 Pro 藏师傅实测:可以说方言的音画同出视频模型
歸藏的AI工具箱· 2025-12-18 04:38
Core Viewpoint - ByteDance has released the Seedance 1.5 Pro video generation model, which significantly enhances audio-visual synchronization and local dialect support, improving the realism and emotional expression in generated videos [1][36]. Group 1: Key Features of Seedance 1.5 Pro - The model supports audio-visual synchronization generation, with improved lip-sync and tone alignment capabilities, particularly effective for various dialects [3][4]. - Enhanced semantic understanding allows the model to better interpret narrative contexts, improving emotional control and professional performance [3][12]. - The model offers precise and rich camera control, enabling complex shots such as long takes and zooms [3][26]. - It can generate videos of varying lengths, with a maximum of 12 seconds in a single output [3]. Group 2: Dialect and Cultural Relevance - The ability to generate dialect content is crucial for adding authenticity and regional characteristics to characters in film and television [5][12]. - The model has shown impressive results in generating dialects like Shaanxi and Sichuan, maintaining the unique phonetic qualities and emotional tones [7][9][11]. Group 3: Emotional and Performance Capabilities - The model demonstrates strong emotional expression, effectively conveying complex feelings such as fear and desperation through facial expressions and voice modulation [20][21]. - It can generate realistic animal sounds and expressions, enhancing the appeal of pet-related content [15][17]. Group 4: Technical Advancements - The model has improved its ability to handle complex camera movements, including advanced techniques like the Hitchcock zoom, achieving smooth transitions and maintaining visual consistency [29][30][32]. - The integration of audio capabilities with high-quality text-to-video generation has significantly reduced the complexity of video production [36][37]. Group 5: Market Implications - The advancements in Seedance 1.5 Pro are expected to lead to a surge in video generation products and video agent applications, making it easier for users to create high-quality content [37].
Medeo 教程:一次生成无脑抽卡不可取,真正的视频 Agent 应该啥样
歸藏的AI工具箱· 2025-12-15 23:06
Core Insights - The article introduces the significant advancements of Medeo's 1.0 version, highlighting its flexibility and improved capabilities in AI video generation, making it a leader in its category [1][58][62]. Group 1: Medeo's Features - Medeo 1.0 supports natural language modifications, allowing users to input concise prompts and generate high-quality videos across various styles and categories [1][4]. - The platform offers a user-friendly interface with templates that include visual styles, scripts, editing methods, and music, making it accessible even for beginners [5][6]. - Users can customize video formats, lengths, and styles, and upload materials directly from URLs or personal files [6][8]. Group 2: Video Creation Process - The video creation process is initiated by simply describing the desired output, with Medeo capable of understanding and executing modifications based on user feedback [7][8]. - Medeo utilizes a context system to match user instructions with relevant video production contexts, enhancing the overall editing experience [62][65]. - The platform can intelligently decide when to use different models for image and video generation, optimizing the production process [10][62]. Group 3: Use Cases and Examples - The article showcases various video examples created using Medeo, including educational content about the Falcon 9 rocket and promotional videos for unique products [2][3][32]. - Specific prompts and templates are provided for creating videos in different styles, such as miniature model aesthetics and lifestyle product advertisements [25][40]. - The article emphasizes the collaborative nature of prompt creation between users and Medeo, allowing for iterative improvements and refinements [47][56]. Group 4: Future Prospects - Medeo is currently in beta testing and is expected to launch fully soon, with a large number of activation codes available for users [68][70]. - The article encourages users to engage with the platform and share their creations, indicating a community-driven approach to content generation [70][71].
Gemini 3+Nano Banana Pro+3D 生成+手势控制=?藏师傅教你炫酷展示运动成果
歸藏的AI工具箱· 2025-12-05 12:02
Core Viewpoint - The article discusses the creation of personalized 3D models and posters for outdoor activities such as hiking, skiing, cycling, and camping, utilizing the Nano Banana Pro tool to showcase achievements while maintaining privacy [4][6][8]. Group 1: Skiing - The skiing poster design involves creating a visual representation of ski tracks on a snow-covered mountain, integrating user-uploaded images of ski equipment to enhance the visual appeal [10][11]. - The atmosphere is emphasized with strong reflections and a snowy forest backdrop, creating a dynamic and engaging scene [11][12]. - The final output includes a title, data from uploaded images, and a short phrase related to the skiing experience [13]. Group 2: Cycling - The cycling poster design focuses on a 3D terrain model featuring a prominent local landmark, with a clear road path illustrating the cycling route [16][17]. - User-uploaded images of bicycles are incorporated into the design, ensuring accurate representation of colors and features [16]. - The visual style includes a shallow depth of field and morning light effects, enhancing the overall aesthetic [17][18]. Group 3: Hiking - The hiking poster design highlights a local landmark with a winding path, integrating user-uploaded images of hiking gear to symbolize the hiking experience [21][22]. - The atmosphere is crafted with a dreamlike quality, featuring elements like mist and reflections on water surfaces [21]. - The final design includes a title, data from uploaded images, and specific geographic coordinates [23]. Group 4: Camping - The camping poster design showcases a local landscape with a focus on the camping setup, using user-uploaded images of tents and camping gear [25][26]. - The scene is set in a night mode with warm lighting effects emanating from the tent, creating a cozy atmosphere [26][27]. - The final output includes a title, data on elevation, temperature, and camping duration, along with a poetic phrase about the camping experience [28]. Group 5: 3D Model Creation - The article explains the process of converting images into 3D models using tools like tripo3d.ai or hyper3d.ai, emphasizing the simplicity of the operation [31][33]. - Users are instructed to download the generated models in GLB format for compatibility [33]. - The final step involves uploading the 3D model and associated data to a platform for interactive display, including gesture control features [36][38]. Group 6: Product Development - The article outlines the straightforward process of building a webpage to showcase 3D models and data visualizations, highlighting the ease of use of the Gemini 3 Pro tool [40][41]. - The design aims for a clean, minimalistic aesthetic while incorporating interactive elements for user engagement [41]. - The article encourages sharing experiences and creations within the outdoor community [42][43].
视频进入可编辑时代:藏师傅教你视频版 Banana 可灵 O1
歸藏的AI工具箱· 2025-12-02 05:18
Core Viewpoint - The article introduces the launch of 可灵's O1, a unified video and image generation and editing tool that integrates multiple tasks into a single interface, allowing for seamless video and image editing and generation. Group 1: Features of O1 - O1 integrates multi-modal video models, combining reference videos, text-to-video, frame manipulation, content addition/removal, and style redrawing into a one-stop solution for generation and modification [2]. - It supports multi-modal inputs including images, videos, subjects, and text, enabling precise editing through natural language without the need for masks or keyframes [2][4]. - The tool maintains consistency in character, props, and scene features across shots through multi-angle subjects and reference materials, ensuring coherent visuals [2]. Group 2: Editing Capabilities - Users can generate narrative shots lasting approximately 3 to 10 seconds, allowing for flexible control over pacing and shot length [2]. - The editing process allows for direct modifications through text prompts, where users can upload videos and specify changes using references [4][6]. - O1 supports the use of single or multiple reference images for background or character modifications, enhancing the realism of the final output [7]. Group 3: Subject Creation and Consistency - O1 introduces a new element called "subject," which allows users to create and select characters for easier integration into videos without frequent uploads [10][13]. - Users can upload multiple images from different angles to improve consistency in character and scene representation during video generation [13][17]. - The tool is particularly beneficial for e-commerce, as it ensures that products remain consistent in appearance during various camera movements [17]. Group 4: Style and Frame Generation - O1 allows users to convert video styles easily, supporting various artistic styles such as felt, anime, and 8-bit pixel [19]. - The tool also supports frame generation, enabling users to create complex effects by combining image references with frame inputs [20][21]. - The overall capabilities of O1 in video editing are seen as a significant advancement, with the potential for creating impressive effects with minimal effort [29].
藏师傅用 Nano Banana Pro 帮你想去哪就去哪
歸藏的AI工具箱· 2025-11-25 12:59
Core Insights - The article discusses the capabilities of the newly released Nano Banana Pro, particularly its ability to generate location-specific images based on geographical coordinates [1][2]. - It highlights the integration of real-time data such as current time and weather conditions to enhance the realism of generated images [2][11]. - The article introduces various features of the product, including a "Travel Portrait" function that allows users to create personalized images at chosen locations [13][15]. Feature Overview - The Nano Banana Pro can generate images in two modes: Scenery mode for landscape photos and Travel Portrait mode for personalized images [8][13]. - Users can upload their own photos to create customized images that reflect the current weather and time at the selected location [15][18]. - The product includes a "Time Machine" feature that allows users to simulate images from different historical periods or alternate realities [20][21]. Additional Functionalities - The "Prank Mode" feature adds unexpected elements to the generated images, enhancing the fun aspect of the application [23]. - The article emphasizes the potential for creative combinations of prompts to yield unique and imaginative results [25]. - Users can quickly generate images using preset examples available on the platform [28]. Usage Instructions - The article provides guidance on accessing the product through various channels, including AI Studio, Poe, and Youware, each with different functionalities and requirements [30]. - Users can obtain geographical coordinates from Google Maps to create images that reflect specific locations and conditions [31].
Nano Banana Pro和顶级设计Agent Lovart会擦出怎样的火花?
歸藏的AI工具箱· 2025-11-22 12:50
Core Viewpoint - Google has launched the optimized Nano Banana Pro model based on Gemini 3, significantly enhancing its capabilities and addressing multilingual issues [2] Group 1: Lovart's Free Activity - Lovart is offering free access to Nano Banana Pro from November 21 to November 23, allowing all users to utilize the model without points for 365 days upon subscribing to Basic or higher membership [3] - Existing Basic and higher-level members will automatically receive the same 365-day unlimited access to Nano Banana Pro [3] Group 2: Usage Instructions - To avoid point deductions, users are advised to operate within the canvas, which allows direct model selection and image uploads without invoking other models [5] - Users can specify the model by using the "@" symbol followed by the model name in the input box [7] - Another method involves selecting the desired model from the model selection icon in the input area, streamlining the process [9] Group 3: Case Studies - A notable application involves combining anime characters with realistic scenes, creating visually striking images [11] - The process has been simplified to generate a realistic environment first and then add anime characters, avoiding the issue of the entire scene becoming anime-styled [15] - The model can generate images based on specific geographic coordinates, incorporating real-time weather and time information to enhance realism [19][20] Group 4: Enhanced PPT Generation - Lovart can generate PowerPoint presentations with greater flexibility compared to NotebookLM, allowing users to create entire sets of slides based on prompts [30] - Various styles for PPT generation have been outlined, including hand-drawn, minimalist, and themed designs, ensuring consistency across slides [36][41] - The model's ability to generate high-resolution images results in clearer text and fewer rendering issues compared to competitors [47] Group 5: Model and Agent Synergy - The integration of Lovart enhances the capabilities of the Nano Banana Pro model, improving batch generation, consistency, and the ability to leverage more features [48]
顶级邪修再战 Nano Banana Pro ,超多玩法,太猛了这玩意!
歸藏的AI工具箱· 2025-11-20 17:30
Core Insights - The article discusses the capabilities of the newly released Nano Banana Pro model, highlighting its advanced features in image generation and editing, particularly its support for real-time knowledge and reasoning, which significantly enhances its functionality [2][69]. Group 1: Model Capabilities - The Nano Banana Pro model has improved world knowledge and reasoning abilities, allowing it to generate accurate visual content based on real-time information [5][69]. - It can create detailed UI designs, such as a weather UI based on current weather data, showcasing its ability to integrate multiple elements and maintain consistency across images [9][11]. - The model supports multi-language capabilities, including strong performance in Chinese, enabling it to generate complex content with mixed languages without errors [14][15][17]. Group 2: Image Generation and Design - The model can generate high-quality collages and themed designs, maintaining the integrity of uploaded images while adding creative elements like handwritten notes and artistic fonts [20][22][24]. - It demonstrates strong consistency in product design, effectively transferring details from original images to new designs, which is crucial for e-commerce applications [27][29]. - The model's ability to adapt to various styles and themes is evident in its capacity to create modern and abstract designs, enhancing the overall aesthetic quality of generated images [57][60]. Group 3: User Applications and Accessibility - The Nano Banana Pro is integrated into various applications such as Lovart, Listenhub, and Flowith, making it widely accessible for users [67]. - Users can access a free version of the model through the Gemini app, although with limited resolution, while premium features are available for paid users [67][69]. - The rapid development and enhancement of the model within a few months reflect the company's commitment to innovation in AI-driven image generation [69].
慢一点、深一点|藏师傅带你看清 Gemini3 真实实力
歸藏的AI工具箱· 2025-11-19 08:04
Core Insights - The article discusses the performance of Gemini 3, highlighting its state-of-the-art (SOTA) capabilities across various benchmarks, significantly outperforming competitors in most categories [1][2]. Benchmark Performance - Gemini 3 Pro achieved the highest scores in several benchmarks, including: - 91.9% in GPQA Diamond for scientific knowledge [2] - 95.0% in AIME 2025 for mathematics without tools [2] - 100% in AIME 2025 with code execution [2] - 87.6% in Video-MMMU for knowledge acquisition from videos [2] - 2,439 Elo Rating in LiveCodeBench Pro for competitive coding [2] - In the ARC-AGI-2 visual reasoning puzzles, Gemini 3 scored 31.1%, significantly higher than its competitors [2]. Multimodal Understanding - The article emphasizes Gemini 3's strong multimodal understanding capabilities, particularly in analyzing video content and generating detailed summaries [6][8]. - It successfully analyzed a complex video, providing detailed insights into each scene and suggesting design tools for implementation [7][8]. Design and Coding Capabilities - Gemini 3 demonstrated advanced design capabilities by generating a complete design agent platform that can autonomously create images and videos based on user prompts [12][14]. - The AI was able to replicate complex design tasks, including logo design and packaging, showcasing its potential for practical applications in design [14][20]. Interactive Content Generation - The AI's ability to generate interactive content was highlighted, with examples of creating interactive games and visual novels based on user-provided scripts [34][36]. - This capability opens up new opportunities for content creation, allowing users to develop engaging narratives and gameplay experiences with minimal input [35]. Technical Implementation - The article provides detailed prompts for users to leverage Gemini 3's capabilities in web development, including creating a storytelling webpage and generating 3D voxel animations from images [26][44]. - The technical requirements emphasize the use of modern web technologies, ensuring that the generated content is visually appealing and functionally robust [28][43].
阿里“闪电战”再发力,这次是千问APP
歸藏的AI工具箱· 2025-11-17 04:04
Core Insights - Alibaba's influence in the AI sector is significant, being one of the few companies capable of competing with Google and OpenAI in both model variety and capability [1] - The recently released Qwen3-Max model demonstrates strong capabilities, ranking just below the leading models from major overseas competitors, while the open-source Qwen3-235B is the top open-source model on Lmarena [1] - Alibaba has developed a comprehensive suite of AI models, covering a wide range of applications including video generation, translation, image editing, and more, positioning itself as a formidable competitor in the AI landscape [4][7] Model Performance and Popularity - Qwen models dominate the download rankings on Huggingface, with over half of the top ten models being Qwen variants, indicating their popularity and acceptance in the community [2] - The Qwen3-Max model scored 1432 in evaluations, showcasing its competitive edge against other proprietary models [2] Application Features - The newly launched Qwen-based Qianwen app serves as a primary entry point for users, integrating various AI capabilities to perform common tasks effectively [8] - The app offers a user-friendly design, allowing users to trigger functions using natural language, making it accessible to a broader audience [10] - Key features include image recognition, real-time translation, and comprehensive health report analysis, demonstrating the app's versatility [20][24][25] User Experience and Accessibility - The Qianwen app provides free access to its features, including video generation with a daily limit of 15 uses, making it appealing to everyday users [12][43] - Users can generate detailed reports and summaries from complex documents, enhancing the app's utility for personal and professional use [30][31] Community and Ecosystem Integration - Alibaba's ecosystem, including platforms like Taobao and DingTalk, enhances the potential for the Qwen models to be integrated into various applications, expanding their reach and functionality [8] - The app's design and functionality are tailored to meet user needs, with a focus on clarity and ease of use, which is crucial for attracting non-technical users [49]