Workflow
Embedding
icon
Search documents
给笔记做一次「降维打击」,我在二维坐标写下了 3000+ 条笔记
3 6 Ke· 2025-10-20 08:15
Core Insights - The article discusses the use of embedding vectors for AI search in note-taking applications, emphasizing the importance of maintaining semantic proximity in high-dimensional vector spaces [1][38] - It highlights the application of the t-SNE algorithm for visualizing high-dimensional data in a lower-dimensional space while preserving local similarities [1][38] Group 1: Application Features - The application (cflow) allows users to visualize over 3000 notes as points in a 2D coordinate system based on their embedding vectors [2][4] - Users can interact with the visualization by clicking on points to view note content and see connections between notes through visual links [4][8] - The application supports advanced search functionalities, allowing users to highlight search results and quickly access related notes through tags [4][6] Group 2: User Experience - Users can explore the visualization to find interesting notes and their relationships, leading to the discovery of "orphan notes" that lack further connections [8][9] - The clustering of notes based on shared themes or experiences, such as restaurant reviews, demonstrates the effectiveness of the embedding algorithm [9][13] - The application allows users to input new text and see its position in relation to existing notes, providing a playful exploration of semantic relationships [24][25] Group 3: Insights on Note Clustering - The clustering of notes related to personal achievements shows how unrelated notes can be grouped based on semantic similarity [13][15] - The proximity of notes related to different tags, such as AI and CODING, indicates overlapping themes in the user's knowledge management [17][19] - The article notes the separation of investment-related notes into distinct clusters, suggesting potential mislabeling or thematic divergence [21][22] Group 4: Additional Features and Reflections - The application includes features for creating automated spaces for saving and organizing notes, enhancing user experience [36] - The author reflects on the revolutionary nature of embedding technology in machine learning, highlighting its ability to transform textual data into meaningful vector representations [38]
X @Avi Chawla
Avi Chawla· 2025-10-06 19:22
Model Training Strategy - The initial approach of capturing user images and training a binary classifier for face unlock is flawed due to the need for on-device training and the difficulty of obtaining "Class 0" samples [1][2] - A Siamese Network trained via Contrastive learning offers a more suitable solution for face unlock systems [2] - Contrastive learning maps data points to a shared embedding space, where low distance indicates similarity and high distance indicates dissimilarity [3] - The system creates a dataset of face pairs, labeling pairs of the same person as 0 and different people as 1, then trains a supervised model [3] - A neural network generates embeddings for each image, and the distance between embeddings is minimized for similar faces and maximized for dissimilar faces using contrastive loss [4] - The contrastive loss function, L = (1-y)*D^2 + y*max(0, margin-D)^2, guides the model to produce low distances for similar inputs and high distances for dissimilar inputs [5] Face Unlock System Implementation - During setup, the user's facial data generates a reference embedding, and subsequent unlocks compare new embeddings against this reference embedding without further training [6] - New identities can be added by creating additional reference embeddings [6] - During unlock, the incoming user's embedding is compared against all reference embeddings [7]
X @Avi Chawla
Avi Chawla· 2025-10-06 06:31
You're in an ML Engineer interview at Apple.The interviewer asks:"You have to build an ML-based face unlock system for iPhones.How would you train the model?"You: "I will capture user's images & train a binary classifier on them"Interview over.Here's what you missed:There are multiple issues with capturing user's images and training a clasifier.> Firstly, you'd need to on-device training, which can be expensive.> All images provided by the user will be "Class 1" samples. To train a binary classifier, where ...
X @Avi Chawla
Avi Chawla· 2025-08-14 06:33
RAG is 80% retrieval and 20% generation.So if RAG isn't working, most likely, it's a retrieval issue, which further originates from chunking and embedding.Contextualized chunk embedding models solve this.Let's dive in to learn more! https://t.co/vnQ5tAj1oe ...