RAG
Search documents
真的花了好久才汇总的大模型技术路线......
具身智能之心· 2025-09-16 00:03
Core Insights - The article emphasizes the transformative impact of large models on various industries, highlighting their role in enhancing productivity and driving innovation in fields such as autonomous driving, embodied intelligence, and generative AI [2][4]. Group 1: Large Model Technology Trends - The large model industry is undergoing significant changes characterized by technological democratization, vertical application, and open-source ecosystems [2]. - There is a growing demand for talent skilled in technologies like RAG (Retrieval-Augmented Generation) and AI Agents, which are becoming core competencies for AI practitioners [2][4]. - The article introduces a comprehensive learning community focused on large models, offering resources such as videos, articles, learning paths, and job exchange opportunities [2][4]. Group 2: Learning Pathways - The community provides detailed learning pathways for various aspects of large models, including RAG, AI Agents, and multimodal models [4][5]. - Specific learning routes include Graph RAG, Knowledge-Oriented RAG, and Reasoning RAG, among others, aimed at both beginners and advanced learners [4][5]. - The pathways are designed to facilitate systematic learning and networking among peers in the field [5]. Group 3: Community Benefits - Joining the community offers benefits such as access to the latest academic advancements, industrial applications, and job opportunities in the large model sector [7][9]. - The community aims to create a collaborative environment for knowledge sharing and professional networking [7][9]. - There are plans for live sessions with industry leaders to further enrich the community's offerings [65][66].
RAG 的概念很糟糕,让大家忽略了应用构建中最关键的问题
Founder Park· 2025-09-14 04:43
Core Viewpoint - The article emphasizes the importance of Context Engineering in AI development, criticizing the current trend of RAG (Retrieval-Augmented Generation) as a misleading concept that oversimplifies complex processes [5][6][7]. Group 1: Context Engineering - Context Engineering is considered crucial for AI startups, as it focuses on effectively managing the information within the context window during model generation [4][9]. - The concept of Context Rot, where the model's performance deteriorates with an increasing number of tokens, highlights the need for better context management [8][12]. - Effective Context Engineering involves two loops: an internal loop for selecting relevant content for the current context and an external loop for learning to improve information selection over time [7][9]. Group 2: Critique of RAG - RAG is described as a confusing amalgamation of retrieval, generation, and combination, which leads to misunderstandings in the AI community [5][6]. - The article argues that RAG has been misrepresented in the market as merely using embeddings for vector searches, which is seen as a shallow interpretation [5][7]. - The author expresses a strong aversion to the term RAG, suggesting that it detracts from more meaningful discussions about AI development [6][7]. Group 3: Future Directions in AI - Two promising directions for future AI systems are continuous retrieval and remaining within the embedding space, which could enhance performance and efficiency [47][48]. - The potential for models to learn to retrieve information dynamically during generation is highlighted as an exciting area of research [41][42]. - The article suggests that the evolution of retrieval systems may lead to a more integrated approach, where models can generate and retrieve information simultaneously [41][48]. Group 4: Chroma's Role - Chroma is positioned as a leading open-source vector database aimed at facilitating the development of AI applications by providing a robust search infrastructure [70][72]. - The company emphasizes the importance of developer experience, aiming for a seamless integration process that allows users to quickly deploy and utilize the database [78][82]. - Chroma's architecture is designed to be modern and efficient, utilizing distributed systems and a serverless model to optimize performance and cost [75][86].
宇树科技官宣IPO后王兴兴首次发声:我最后悔的是以前没有学AI;甲骨文与OpenAI签署3000亿美元的算力协议丨AIGC日报
创业邦· 2025-09-12 00:12
Group 1 - Tencent's Youtu-GraphRAG has been open-sourced, featuring a new graph retrieval-enhanced generation framework that combines large language models with RAG mode, aimed at improving accuracy and traceability in complex Q&A tasks, particularly in knowledge-intensive scenarios like enterprise knowledge base Q&A and personal knowledge management [2] - Yushu Technology's CEO Wang Xingxing expressed regret for not learning AI earlier, highlighting the rapid advancements in AI capabilities and the potential for integrating AI with robotics, especially in light of the company's recent IPO announcement [2] - California's legislature is moving towards regulating AI chatbots with the passage of SB 243, which will require operators to implement safety protocols and hold companies legally accountable if standards are not met, set to take effect on January 1, 2026 [2] - Oracle has reportedly signed a $300 billion computing power agreement with OpenAI, marking one of the largest cloud service contracts in history, requiring 4.5 gigawatts of power capacity [2]
0.3B,谷歌开源新模型,手机断网也能跑,0.2GB内存就够用
3 6 Ke· 2025-09-05 07:14
Core Insights - Google has launched a new open-source embedding model called EmbeddingGemma, designed for edge AI applications with 308 million parameters, enabling deployment on devices like laptops and smartphones for retrieval-augmented generation (RAG) and semantic search [2][3] Group 1: Model Features - EmbeddingGemma ranks highest among open multilingual text embedding models under 500 million parameters on the MTEB benchmark, trained on over 100 languages and optimized to run on less than 200MB of memory [3][5] - The model is designed for flexible offline work, providing customizable output sizes and a 2K token context window, making it suitable for everyday devices [5][13] - It integrates seamlessly with popular tools such as sentence-transformers, MLX, and LangChain, facilitating user adoption [5][12] Group 2: Performance and Quality - EmbeddingGemma generates high-quality embedding vectors, crucial for accurate RAG processes, enhancing the retrieval of relevant context and the generation of contextually appropriate answers [6][9] - The model's performance in retrieval, classification, and clustering tasks surpasses that of similarly sized models, approaching the performance of larger models like Qwen-Embedding-0.6B [10][11] - It utilizes Matryoshka representation learning (MRL) to offer various embedding sizes, allowing developers to balance quality and speed [12] Group 3: Privacy and Efficiency - EmbeddingGemma operates effectively offline, ensuring user data privacy by generating document embeddings directly on device hardware [13] - The model's inference time on EdgeTPU is under 15ms for 256 input tokens, enabling real-time responses and smooth interactions [12][13] - It supports new functionalities such as offline searches across personal files and personalized chatbots, enhancing user experience [13][15] Group 4: Conclusion - The introduction of EmbeddingGemma signifies a breakthrough in miniaturization, multilingual capabilities, and edge AI, potentially becoming a cornerstone for the proliferation of intelligent applications on personal devices [15]
程序员的行情跌到谷底了。。
猿大侠· 2025-09-04 04:11
Core Insights - The job market for programmers has become increasingly competitive, with traditional skills being less valued in the face of AI advancements. However, those who can integrate existing skills with AI technologies are in high demand [1] - A free course titled "Large Model Application Development - Employment Practice" is being offered to help individuals enhance their skills in AI application development, which is crucial for securing high-paying job offers [1][2] Summary by Sections Job Market Trends - The demand for programmers has shifted, with HR now prioritizing knowledge of AI-related technologies such as RAG and fine-tuning [1] - Programmers who adapt their existing skills to include AI capabilities can significantly enhance their employability and salary potential, as demonstrated by a case where an individual saw a 30% salary increase after acquiring new skills [1] Course Offerings - The course includes technical principles, practical projects, and employment guidance, aimed at helping participants understand and utilize large models effectively [2][3] - Participants will receive valuable resources such as internal referrals, interview materials, and knowledge graphs to aid in their job search [3][24] Technical Content - The course covers key AI technologies, including RAG, Function Call, and Agent, which are essential for developing AI applications [6][10] - It emphasizes practical experience through case studies and hands-on projects, allowing participants to build a strong portfolio for job applications [8][15] Career Development - The course aims to help individuals build technical barriers, connect with product teams, and avoid job market pitfalls, particularly for those nearing the age of 35 [12][20] - Successful completion of the course is expected to lead to significant career advancements, with many participants already achieving job transitions [17]
开放几个大模型技术交流群(RAG/Agent/通用大模型等)
自动驾驶之心· 2025-09-04 03:35
Group 1 - The establishment of a Tech communication group focused on large models, inviting participants to discuss topics such as RAG, AI Agents, multimodal large models, and deployment of large models [1] - Interested individuals can join the group by adding a designated WeChat assistant and providing their nickname along with a request to join the large model discussion group [2]
AI读网页,这次真不一样了,谷歌Gemini解锁「详解网页」新技能
机器之心· 2025-09-02 03:44
Core Viewpoint - Google is returning to its core business of search by introducing the Gemini API's URL Context feature, which allows AI to "see" web content like a human [1]. Group 1: URL Context Functionality - The URL Context feature enables the Gemini model to access and process content from URLs, including web pages, PDFs, and images, with a content limit of up to 34MB [1][5]. - Unlike traditional methods where AI reads only summaries or parts of a webpage, URL Context allows for deep and complete document parsing, understanding the entire structure and content [5][6]. - The feature supports various file formats, including PDF, PNG, JPEG, HTML, JSON, and CSV, enhancing its versatility [7]. Group 2: Comparison with RAG - URL Context Grounding is seen as a significant advancement over the traditional Retrieval-Augmented Generation (RAG) approach, which involves multiple complex steps such as content extraction, chunking, vectorization, and storage [11][12]. - The new method simplifies the process, allowing developers to achieve accurate results with minimal coding, eliminating the need for extensive data processing pipelines [13][14]. - URL Context can accurately extract specific data from documents, such as financial figures from a PDF, which would be impossible with just summaries [14]. Group 3: Operational Mechanism - The URL Context operates on a two-step retrieval process to balance speed, cost, and access to the latest data, first attempting to retrieve content from an internal index cache [25]. - If the URL is not cached, it performs real-time scraping to obtain the content [25]. - The pricing model is straightforward, charging based on the number of tokens processed from the content, encouraging developers to provide precise information sources [27]. Group 4: Limitations and Industry Trends - URL Context has limitations, such as being unable to access content behind paywalls, specialized tools like YouTube videos, and having a maximum capacity of processing 20 URLs at once [29]. - The emergence of URL Context indicates a trend where foundational models are increasingly integrating external capabilities, reducing the complexity previously handled by application developers [27].
一年成爆款,狂斩 49.1k Star、200 万下载:Cline 不是开源 Cursor,却更胜一筹?!
AI前线· 2025-08-20 09:34
Core Viewpoint - The AI coding assistant market is facing significant challenges, with many popular tools operating at a loss due to unsustainable business models that rely on venture capital subsidies [2][3]. Group 1: Market Dynamics - The AI market is forming a three-tier competitive structure: model layer focusing on technical strength, infrastructure layer competing on price, and coding tools layer emphasizing functionality and user experience [2]. - Companies like Cursor are attempting to bundle these layers together, but this approach is proving unsustainable as the costs of AI inference far exceed the subscription fees charged to users [2][3]. Group 2: Cline's Approach - Cline adopts an open-source model, believing that software should be free, and generates revenue through enterprise services such as team management and technical support [5][6]. - Cline has rapidly grown to a community of 2.7 million developers within a year, showcasing its popularity and effectiveness [7][10]. Group 3: Product Features and User Interaction - Cline introduces a "plan + action" paradigm, allowing users to create a plan before executing tasks, which enhances user experience and reduces the learning curve [12][13]. - The system allows users to switch between planning and action modes, facilitating a more intuitive interaction with the AI [13][14]. Group 4: Economic Value and Market Position - Programming is identified as the most cost-effective application of large language models, with a growing focus from model vendors on this area [21][22]. - Cline's integration with various services and its ability to streamline interactions through natural language is seen as a significant advantage in the evolving market landscape [22][23]. Group 5: MCP Ecosystem - The MCP (Model Control Protocol) ecosystem is developing, with Cline facilitating user understanding and implementation of MCP servers, which connect various tools and services [24][25]. - Cline has launched over 150 MCP servers, indicating a robust market presence and user engagement [26]. Group 6: Future Directions - The future of programming tools is expected to shift towards more natural language interactions, reducing reliance on traditional coding practices [20][22]. - As AI models improve, the need for user intervention is anticipated to decrease, allowing for more automated processes in software development [36][39].
X @Avi Chawla
Avi Chawla· 2025-08-18 06:30
Product Overview - Tensorlake transforms unstructured documents into RAG-ready data with a few lines of code [1] - It returns document layout, structured extraction, and bounding boxes [1] - The solution works on complex layouts, handwritten documents, and multilingual data [1] Target Audience - The information is relevant for individuals interested in Data Science (DS), Machine Learning (ML), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG) [1]
X @Avi Chawla
Avi Chawla· 2025-08-16 06:30
That's a wrap!If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):A graph-powered all-in-one RAG system!RAG-Anything is a graph-driven, all-in-one multimodal document processing RAG system built on LightRAG.It supports all content modalities within a single integrated framework.100% open-source. https://t.co/XGpDK0Ctht ...