Workflow
Hugging Face
icon
Search documents
X @TechCrunch
TechCrunch· 2025-09-24 14:33
Explore the full AI Stage at TechCrunch Disrupt 2025 with leaders from Hugging Face, Google Cloud, Wayve, and more. Register now to save up to $668. https://t.co/9avd0lUTH5 ...
AI Engineer Paris 2025 (Day 2)
AI Engineer· 2025-09-23 18:15
AI Engineering & Industry Leaders - Neo4j's Co-Founder and CEO discusses "The State of AI Engineering" [1] - Docker focuses on "Democratizing AI Agents: Building, Sharing, and Securing Made Simple" [1] - GitHub addresses "Building MCP's at GitHub Scale" [1] - H Company is assembling open source bricks for the next generation of AI [1] - Google DeepMind shares updates on generative AI [1] AI Infrastructure & Tools - Koyeb explores "Building for the Agentic Era: The Future of AI Infrastructure" [1] - Black Forest Labs presents "Inside FLUX, How It Really Works" [1] - LlamaIndex is building an open-source NotebookLM alternative [1] Open Source & Community - Hugging Face reports on the "State of Open LLMs in 2025" [1] AI Applications & Techniques - Arize AI studies "System Prompt Learning for Agents" [1] - ZML is working "Towards unlimited contexts: faster-than-GPU sparse logarithmic attention on CPU" [1] - Kyutai is scaling real-time voice AI [1]
X @TechCrunch
TechCrunch· 2025-09-18 14:02
Hugging Face co-founder Thomas Wolf joins TechCrunch Disrupt 2025 to share how open-source and moonshot projects are shaping AI. Register now to save. https://t.co/boxacnqqj0 ...
Hugging Face 发布 FinePDFs:基于 PDF 文档构建的 3 万亿 Token 数据集
AI前线· 2025-09-17 06:17
Core Insights - Hugging Face has launched FinePDFs, the world's largest pure PDF public corpus, encompassing 4.75 billion documents in 1,733 languages, totaling approximately 30 trillion tokens [2] - FinePDFs offers unique advantages over traditional HTML-based datasets, particularly in high-quality, domain-specific content extraction from legal, academic, and technical writing [2] - The dataset employs advanced techniques for text extraction, including Docling for text extraction and RolmOCR for GPU-driven OCR, ensuring high-quality data processing [2] Summary by Sections Dataset Composition - The dataset includes over 1.1 trillion tokens in English, with Spanish, German, French, Russian, and Japanese each contributing over 100 billion tokens [3] - It also represents smaller languages, with 978 languages contributing over 1 million tokens [3] Performance Evaluation - Hugging Face trained a 1.67 billion parameter model on a subset of FinePDFs, achieving performance comparable to the state-of-the-art HTML dataset SmolLM-3 Web [3] - Combining both datasets significantly improved performance, highlighting the complementary knowledge that PDFs can provide [3] Community Response and Transparency - The evaluation results have sparked questions within the community regarding the assessment methodology and scoring [4] - Hugging Face emphasizes the dataset's potential for advancing long-context training due to the typically longer nature of PDF documents compared to web pages [4] - The dataset is available under an open data sharing license for research and development, hosted on Hugging Face Hub [4]
X @Demis Hassabis
Demis Hassabis· 2025-09-11 01:28
RT Omar Sanseviero (@osanseviero)Out of over 2 million open models, EmbeddingGemma is the top trending model on Hugging Facehttps://t.co/f60p6xE0RT ...
今晚直播|星海图 X Hugging Face!开源生态如何引领具身智能的未来?
具身智能之心· 2025-08-29 00:05
Core Viewpoint - The article emphasizes the importance of open-source ecosystems in accelerating the development and implementation of embodied intelligence, highlighting collaborations among various industry players and developers [1]. Group 1 - The collaboration between Starry Sea Map and Hugging Face aims to foster a vibrant developer community and explore open-source models and datasets [1][2]. - A live discussion featuring Thomas Wolf, co-founder of Hugging Face, and Zhao Xing, chief scientist of Starry Sea Map, will take place to discuss the future of embodied intelligence and the open-source ecosystem [3][6]. - The live event is scheduled for August 29 at 19:00 [4][10].
全球最赚钱的50款AI应用是怎么做流量增长的? | Jinqiu Select
锦秋集· 2025-08-27 14:55
Core Insights - The article discusses the evolution of AI startups from "model frenzy" in 2023 to "growth competition" in 2025, emphasizing the importance of user acquisition and retention strategies for sustainable growth [1][2]. Group 1: Growth Strategies - Companies are increasingly focused on understanding their user acquisition sources, retention strategies, and future growth potential [2][3]. - The analysis highlights that transforming cold traffic into active users and revenue is crucial for securing future market positions [4]. Group 2: Traffic Sources and Analysis - A detailed analysis of the top 50 AI startups reveals that brand recognition is a key competitive barrier, with direct traffic being a significant indicator of consumer trust and habitual consumption [14]. - Search traffic serves as a foundational source for nearly all companies, with a focus on search engine optimization (SEO) being essential for low-cost and stable user growth [14]. - Companies with diverse traffic channels tend to have greater growth potential and resilience against market fluctuations [14]. Group 3: Company-Specific Traffic Insights - **OpenAI**: Dominated by organic search (58.89%), with direct access at 29.79% and referrals at 9.77%. Paid search is minimal at 0.06% [18][19]. - **Anthropic**: Balanced traffic sources with organic search at 42.25% and referrals at 11.04%. The company relies heavily on non-paid channels [32]. - **Grammarly**: Exhibits a diverse traffic structure with direct access at 43.94% and organic search at 42.25%, indicating a strong brand presence [34]. - **Midjourney**: Direct access is the primary source at 65.71%, with organic search contributing 26.84% [42]. - **Dialpad**: Direct access leads at 64.91%, followed by organic search at 24.32%, showcasing effective brand engagement [62]. Group 4: Paid and Referral Traffic - Paid search is a minor contributor across many companies, with **6sense** showing 6.54% from paid sources, indicating a reliance on organic and direct traffic for growth [106]. - Referral traffic varies significantly, with **Cleo** receiving 2.80% from referrals, highlighting the importance of partnerships and external visibility [79]. Group 5: Industry Trends - The analysis indicates a shift towards brands leveraging organic growth strategies over paid advertising, as companies seek sustainable user acquisition methods [14][4]. - The competitive landscape is characterized by a focus on brand loyalty and the ability to convert traffic into long-term users, which is becoming increasingly critical for success in the AI sector [4][14].
The Top 100 Most Used AI Apps in 2025
a16z· 2025-08-27 13:00
Consumer AI Trends - The consumer AI list tracks real-world AI usage by analyzing website traffic (monthly visits globally) and mobile app usage (monthly active users) [1] - Companionship continues to dominate the consumer AI space, with new names like Juicy Chat, Joy, and Rream joining established players [1] - Vibe coding is a rising trend, with companies like Lovable and Replit gaining traction and showing strong revenue retention (100% or above in the first 3 months for many leading platforms) [1][2] - Chinese AI companies are making a significant impact, both with products for the domestic market (e.g., Cork, Dow Bow, Kimmy) and those built for international users (e.g., Deepseek, Hyo, Cling, Cart) [2] - Some Chinese companies are distributing their models through US properties like Crea or Hedra [2] Google's Performance - Google had a strong six months, with four unique properties making the web list: Gemini (number two with about 10% of ChatGPT's web traffic, but closer on mobile at half of ChatGPT's traffic), AI Studio (top 10), Notebook LM (number 13), and Google Labs (number 39) [1] - Google's AI Studio, a developer-facing sandbox, surprisingly hit the top 10 [1] - Google Labs includes V3, a new video model, and other products like Doppel and Project Mariner [1] All-Star Companies - 14 companies have made the list every time over the past 2 years, including ChatGPT, Perplexity, PO (general LLM assistants); Character AI (companionship); Midjourney, Photo Room, Leonardo Cutout Pro, V, 11 Labs (creative tools); Quillbot, Gamma (productivity); and Hugging Face, Civid AI (model hosting) [3] - Over half of the "all-star" companies host or use other people's models or are model aggregators, highlighting the importance of UI and product experience [3] Future Predictions - The industry expects verticalization of AI products, with users choosing different tools for specific tasks [10] - Productivity-focused "prosumer" tools are expected to explode in usage as model reliability and UI improve [14][15] - Emerging categories to watch include edtech, personal finance, and AI-native social platforms [18][19][20]
GPT-5“让人失望”,AI“撞墙”了吗?
华尔街见闻· 2025-08-18 10:44
Core Viewpoint - The release of OpenAI's GPT-5 has not met expectations, leading to disappointment and raising questions about the current limits of generative AI technology, despite ongoing enthusiasm in capital markets for practical applications of AI [1][2][3]. Group 1: Performance and Expectations - Users have reported low-level errors in GPT-5, such as incorrect labeling of the U.S. map, and expressed dissatisfaction with its performance compared to previous models [2][3]. - CEO Sam Altman acknowledged the release was "bumpy," attributing issues to a malfunctioning "automatic switcher" that caused the system to call a weaker model [3][4]. - The optimism surrounding AGI has not materialized with GPT-5, leading to a reassessment of its capabilities and the competitive landscape, as rivals like Google and Anthropic have narrowed the gap with OpenAI [4][6]. Group 2: Scaling Laws and Limitations - The core logic supporting large language models, known as "scaling laws," is approaching its limits, with data exhaustion and physical/economic constraints on computational power being significant challenges [6][8]. - The training of GPT-5 reportedly utilized hundreds of thousands of next-generation Nvidia processors, highlighting the immense energy consumption required for such models [6]. Group 3: Market Dynamics and Investment Trends - Despite concerns about technological stagnation, investment in AI startups and infrastructure remains robust, with AI accounting for 33% of global venture capital this year [7][10]. - The focus of the AI race is shifting from achieving AGI to practical productization, with companies like OpenAI deploying engineers to assist clients in integrating AI models [8][9]. - Investors are increasingly valuing the strong growth of products like ChatGPT, which has generated an annual recurring revenue of $12 billion for OpenAI, rather than the distant promise of AGI [10][11].
GPT-5“让人失望”,AI“撞墙”了吗?
Hua Er Jie Jian Wen· 2025-08-17 03:00
Core Insights - OpenAI's GPT-5 release did not meet expectations, leading to disappointment among users and raising questions about the future of AI development [1][3] - The focus of the AI race is shifting from achieving AGI to practical applications and cost-effective productization [2][7] Group 1: Performance and Expectations - GPT-5's performance was criticized for being subpar, with users reporting basic errors and a lack of significant improvements over previous models [1][3] - The release has sparked discussions about whether the advancements in generative AI have reached their limits, challenging OpenAI's high valuation of $500 billion [1][5] Group 2: Market Sentiment and Investment - Despite concerns about technological stagnation, investor enthusiasm for AI applications remains strong, with AI accounting for 33% of global venture capital this year [6][8] - Companies are increasingly focusing on integrating AI models into products, with OpenAI deploying engineers to assist clients, indicating a shift towards practical applications [7][8] Group 3: Challenges and Limitations - The "scaling laws" that have driven the development of large language models are approaching their limits due to data exhaustion and the physical and economic constraints of computational power [5][6] - Historical parallels are drawn to past "AI winters," with warnings that inflated expectations could lead to a rapid loss of investor confidence [6] Group 4: Future Directions - The industry is moving towards multi-modal data and "world models" that understand the physical world, suggesting potential for future innovation despite current limitations [7] - Investors believe there is still significant untapped value in current AI models, with strong growth in products like ChatGPT contributing to OpenAI's recurring revenue of $12 billion annually [8]