Gemini API
Search documents
X @Demis Hassabis
Demis Hassabis· 2025-11-09 23:10
RT Logan Kilpatrick (@OfficialLoganK)Introducing the File Search Tool in the Gemini API, our hosted RAG solution with free storage and free query time embeddings 💾We are super excited about this new approach and think it will dramatically simplify the path to context aware AI systems, more details in 🧵 ...
X @Demis Hassabis
Demis Hassabis· 2025-10-18 01:19
RT Logan Kilpatrick (@OfficialLoganK)Introducing grounding with Google Maps in the Gemini API, bringing data about 250 million places and Gemini together to create all new experiences 🗺️!So powerful to connect things like maps + search together in a single experience : ) https://t.co/X77OcCf4ZR ...
刚刚, AI视频王者大更新!硬刚Sora,威尔史密斯吃面更香了
创业邦· 2025-10-16 03:23
Core Insights - OpenAI recently launched the Sora 2 video generation model, while Google upgraded its Veo 3.1 model, indicating a competitive landscape in AI video generation technology [4][41]. Group 1: Google Veo 3.1 Upgrade - The upgrade includes enhanced video editing capabilities, allowing users to make more precise adjustments to video segments [5]. - New features such as "Ingredients to Video," "Frames to Video," and "Extend" now incorporate audio, making audio a part of the creative process [7][11]. - Veo 3.1 shows significant improvements in prompt understanding and audiovisual quality, resulting in more natural transitions from images to videos [8]. Group 2: User Functionality - Users can define characters and styles using multiple reference images, which the "Ingredients to Video" feature utilizes to generate final scenes [13]. - The "Frames to Video" feature allows for seamless transitions between starting and ending frames, beneficial for artistic projects [15]. - The "Extend" feature can generate content longer than one minute, maintaining narrative continuity based on previous segments [17]. Group 3: Output Formats and User Engagement - Veo 3.1 now supports both horizontal and vertical video formats, adapting to current content consumption trends [19]. - Since the launch of Flow in May, users have created over 275 million videos, leading to the introduction of new editing features like "Insert New Elements" and "Remove Objects" for more flexible video editing [20]. Group 4: Application Scenarios - Practical applications of Veo 3 include generating first-person perspective videos, ASMR fruit slicing, and night vision monitoring videos [24]. - The model has been used to create product advertisement videos, showcasing its ability to deliver high-quality visual content [30]. Group 5: Performance Comparison - While Veo 3.1 excels in photo-realistic and commercial content generation, it still has room for improvement in accurately replicating specific artistic styles, such as anime [40]. - The rapid iteration of video generation models like Veo 3.1 and Sora 2 suggests a fast-evolving market, with potential for widespread adoption in various content creation platforms [41][42].
刚刚,谷歌Veo 3.1迎来重大更新,硬刚Sora 2
机器之心· 2025-10-16 00:51
Core Insights - Google has released its latest AI video generation model, Veo 3.1, which enhances audio, narrative control, and visual quality compared to its predecessor, Veo 3 [2][3] - The new model introduces native audio generation capabilities, allowing users to better control the emotional tone and narrative pacing of videos during the creation phase [10] Enhanced Audio and Narrative Control - Veo 3.1 improves support for dialogue, environmental sound effects, and other audio elements, allowing for a more immersive video experience [5] - Core functionalities in Flow, such as "Frames to Video" and "Ingredients to Video," now support native audio generation, enabling users to create longer video clips that can extend beyond the original 8 seconds to 30 seconds or even longer [6][9] Richer Input and Editing Capabilities - The model accepts various input types, including text prompts, images, and video clips, and supports up to three reference images to guide the final output [12] - New features like "Insert" and "Remove" allow for more precise editing, although not all functionalities are immediately available through the Gemini API [13] Multi-Platform Deployment - Veo 3.1 is accessible through several existing Google AI services and is currently in a preview phase, available only in the paid tier of the Gemini API [15][16] - The pricing structure remains consistent with the previous Veo model, charging only after successful video generation, which aids in budget predictability for enterprise teams [16][21] Technical Specifications and Output Control - The model supports video output at 720p or 1080p resolution with a frame rate of 24 frames per second [18] - Users can upload product images to maintain visual consistency throughout the video, simplifying the creative production process for branding and advertising [19] Creative Applications - Google’s Flow platform serves as an AI-assisted movie creation tool, while the Gemini API is aimed at developers looking to integrate video generation features into their applications [20]
「免费额度」秒变40万债务?学生误泄Gemini API密钥背上巨额账单:开发者社区炸锅,谷歌最终免单
3 6 Ke· 2025-09-28 07:13
Core Points - A student from Georgia accidentally leaked his Google Cloud Gemini API Key on GitHub, leading to a bill of $55,444 in just a few months due to malicious usage [1][3][9] - The incident sparked discussions among developers regarding Google's lack of a hard spending cap and the need for better user protection mechanisms [2][6][8] Incident Details - The student registered for Google Cloud using a school email, intending to utilize the $300 free credit for learning experiments, but only consumed $80 before the leak occurred [3][4] - The API Key was exposed on June 6, and the student was unaware of the issue until September 7, when he was alerted by another GitHub user [3][5] - The bill accumulated in three phases: $732 in June, over $31,000 in August, and an additional $21,000 from September 1 to 7 [4][7] Google's Response - Upon discovering the issue, the student contacted Google Cloud support and provided evidence, but Google stated that the bill would not be canceled or modified [5][6] - The student expressed that the bill represented decades of income for him, highlighting the severe financial impact of the situation [6] Developer Community Reaction - The incident led to widespread discussion among developers, questioning why Google does not implement a hard spending limit and only provides alerts [8] - Some developers shared their own experiences and suggested best practices to prevent similar issues, such as limiting API call quotas and using tools to scan for leaked keys [8] Resolution - Ultimately, after increased attention from the developer community, Google Cloud's billing team reviewed the case again and waived the entire $55,444 bill on September 25 [9]
谷歌 - 2025 年 Communacopia + 科技大会-关键要点
2025-09-11 12:11
Summary of Alphabet Inc. (GOOGL) Conference Call Company Overview - **Company**: Alphabet Inc. (GOOGL) - **Event**: Communacopia + Technology Conference 2025 - **Presenter**: Google Cloud CEO Thomas Kurian Key Industry Insights - **Cloud Adoption**: There is a long runway for cloud adoption and future migrations to public cloud, driven primarily by organizations seeking to transform their businesses through AI products and solutions offered in the cloud [2][5] - **AI Systems**: Google Cloud's AI systems are designed for high performance, reliability, and scalability in both training and inference [2][5] - **Revenue Diversification**: The company has developed a diversified revenue base with 13 product lines generating over $1 billion in annual revenue each [2][5] Core Company Strategies - **Monetization of AI**: Management outlined multiple monetization strategies for AI, including consumption, subscription, increased usage, value-based pricing, and premium upsell [2][5][6] - **Product Development**: Focus on building domain-specific enterprise agents across five areas: code/data/security, creativity/collaboration, specific application domains, specific industries, and chat & agent platforms [5][6] - **Generative AI**: Commitment to expanding enterprise access to models, offering a suite of 182 leading models, including large-scale models for generative AI applications [5][6] Financial Performance and Projections - **Operating Margins**: Improvement in operating margins and profitability as Google Cloud expands its customer base and product usage [6] - **Cost Optimization**: Early decisions to develop proprietary chips and models have led to cost optimization and efficiency [6] - **Price Target**: The 12-month price target for GOOGL is set at $234, with a current price of $239.63, indicating a downside potential of 2.3% [8] Financial Metrics (Projected) - **Revenue Growth**: Projected revenues of $295.1 billion in 2025, increasing to $424.4 billion by 2027 [8] - **EBITDA**: Expected EBITDA growth from $127.7 billion in 2025 to $206.9 billion in 2027 [8] - **EPS Growth**: Projected EPS growth from $8.04 in 2025 to $11.56 in 2027 [8] Risks and Challenges - **Competitive Landscape**: Risks include competition affecting product utility and advertising revenues [7] - **Market Disruption**: Potential headwinds from industry disruption impacting monetizable search [7] - **Regulatory Scrutiny**: Exposure to regulatory scrutiny and changes in industry practices that could alter business model prospects [7] - **Macroeconomic Factors**: Vulnerability to global macroeconomic volatility and investor risk appetite for growth stocks [7] Conclusion - **Investment Rating**: The company is rated as a "Buy" with a focus on its strong growth potential in cloud and AI sectors, despite facing various risks and competitive challenges [6][7]
AI读网页,这次真不一样了,谷歌Gemini解锁「详解网页」新技能
机器之心· 2025-09-02 03:44
Core Viewpoint - Google is returning to its core business of search by introducing the Gemini API's URL Context feature, which allows AI to "see" web content like a human [1]. Group 1: URL Context Functionality - The URL Context feature enables the Gemini model to access and process content from URLs, including web pages, PDFs, and images, with a content limit of up to 34MB [1][5]. - Unlike traditional methods where AI reads only summaries or parts of a webpage, URL Context allows for deep and complete document parsing, understanding the entire structure and content [5][6]. - The feature supports various file formats, including PDF, PNG, JPEG, HTML, JSON, and CSV, enhancing its versatility [7]. Group 2: Comparison with RAG - URL Context Grounding is seen as a significant advancement over the traditional Retrieval-Augmented Generation (RAG) approach, which involves multiple complex steps such as content extraction, chunking, vectorization, and storage [11][12]. - The new method simplifies the process, allowing developers to achieve accurate results with minimal coding, eliminating the need for extensive data processing pipelines [13][14]. - URL Context can accurately extract specific data from documents, such as financial figures from a PDF, which would be impossible with just summaries [14]. Group 3: Operational Mechanism - The URL Context operates on a two-step retrieval process to balance speed, cost, and access to the latest data, first attempting to retrieve content from an internal index cache [25]. - If the URL is not cached, it performs real-time scraping to obtain the content [25]. - The pricing model is straightforward, charging based on the number of tokens processed from the content, encouraging developers to provide precise information sources [27]. Group 4: Limitations and Industry Trends - URL Context has limitations, such as being unable to access content behind paywalls, specialized tools like YouTube videos, and having a maximum capacity of processing 20 URLs at once [29]. - The emergence of URL Context indicates a trend where foundational models are increasingly integrating external capabilities, reducing the complexity previously handled by application developers [27].
谷歌Nano Banana全网刷屏,起底背后团队
机器之心· 2025-08-29 04:34
Core Viewpoint - Google DeepMind has introduced the Gemini 2.5 Flash Image model, which features native image generation and editing capabilities, enhancing user interaction through multi-turn dialogue and maintaining scene consistency, marking a significant advancement in state-of-the-art (SOTA) image generation technology [2][30]. Team Behind the Development - Logan Kilpatrick, a senior product manager at Google DeepMind, leads the development of Google AI Studio and Gemini API, previously known for his role at OpenAI and experience at Apple and NASA [6][9]. - Kaushik Shivakumar, a research engineer at Google DeepMind, focuses on robotics and multi-modal learning, contributing to the development of Gemini 2.5 [12][14]. - Robert Riachi, another research engineer, specializes in multi-modal AI models, particularly in image generation and editing, and has worked on the Gemini series [17][20]. - Nicole Brichtova, the visual generation product lead, emphasizes the integration of generative models in various Google products and their potential in creative applications [24][26]. - Mostafa Dehghani, a research scientist, works on machine learning and deep learning, contributing to significant projects like the development of multi-modal models [29]. Technical Highlights of Gemini 2.5 - The model showcases advanced image editing capabilities while maintaining scene consistency, allowing for quick generation of high-quality images [32][34]. - It can creatively interpret vague instructions, enabling users to engage in multi-turn interactions without lengthy prompts [38][46]. - Gemini 2.5 has improved text rendering capabilities, addressing previous shortcomings in generating readable text within images [39][41]. - The model integrates image understanding with generation, enhancing its ability to learn from various modalities, including images, videos, and audio [43][45]. - The introduction of an "interleaved generation mechanism" allows for pixel-level editing through iterative instructions, improving user experience [46][49]. Comparison with Other Models - Gemini aims to integrate all modalities towards achieving artificial general intelligence (AGI), distinguishing itself from Imagen, which focuses on text-to-image tasks [50][51]. - For tasks requiring speed and cost-effectiveness, Imagen remains a suitable choice, while Gemini excels in complex multi-modal workflows and creative scenarios [52]. Future Outlook - The team envisions future models exhibiting higher intelligence, generating results that exceed user expectations even when instructions are not strictly followed [53]. - There is excitement around the potential for future models to produce aesthetically pleasing and functional visual content, such as accurate charts and infographics [53].
X @Demis Hassabis
Demis Hassabis· 2025-08-18 21:37
RT Logan Kilpatrick (@OfficialLoganK)Today we are making URL Context, my favorite Gemini API tool, ready for scaled production use 🔗The model can now visit webpages, PDF's, images, and more when you provide the direct URL, and you simply pay for the tokens it processes, no additional tool cost! https://t.co/ukuev45pJg ...
X @Demis Hassabis
Demis Hassabis· 2025-08-15 17:27
AI Model Updates & Availability - Google launched Imagen 4 Fast model for quick image generation at $0.02 per image [1] - Imagen 4 and Imagen 4 Ultra now support 2K images and are generally available in the Gemini API and Vertex AI [1] - Google introduced Gemma 3 270M, a hyper-efficient model for developers to fine-tune [1] Gemini App Enhancements - Google AI Ultra subscribers can now run twice as many Deep Think queries, up to 10 prompts per day in the Gemini App [2] - The Gemini App can now reference past chats for more personalized responses [2] - Temporary Chats and new privacy settings introduced in Gemini App [2] AI Research - Google Research & Google DeepMind introduced g-AMIE, exploring AI's role in doctor-patient conversations [2]