Quantization
Search documents
X @Polyhedra
Polyhedra· 2025-10-31 12:00
4/Gemma3 Quantization: Introduced front-end padding for variable-length inputs to enable flexible inference.Next Steps (Circuit Optimization): Plan to prune gemma3 nodes during circuitization to reduce redundancy and improve proof efficiency.Stay tuned for more updates. ...
X @Avi Chawla
Avi Chawla· 2025-10-30 19:45
RT Avi Chawla (@_avichawla)voyage-3-large embedding model just topped the RTEB leaderboard!It's a big deal because it:- ranks first across 33 eval datasets- outperforms OpenAI and cohere models- supports quantization to reduce storage costsHere's another reason that makes this model truly superior:Most retrieval benchmarks test models on academic datasets that don’t reflect real-world data.RTEB, on the other hand, is a newly-released leaderboard on HuggingFace that evaluates retrieval models across enterpri ...
X @Avi Chawla
Avi Chawla· 2025-10-30 06:31
voyage-3-large embedding model just topped the RTEB leaderboard!It's a big deal because it:- ranks first across 33 eval datasets- outperforms OpenAI and cohere models- supports quantization to reduce storage costsHere's another reason that makes this model truly superior:Most retrieval benchmarks test models on academic datasets that don’t reflect real-world data.RTEB, on the other hand, is a newly-released leaderboard on HuggingFace that evaluates retrieval models across enterprise domains like finance, la ...
Hard Work is Useless. This is What Matters. | Manav Gupta | TEDxGHRCEMN
TEDx Talks· 2025-10-24 15:23
Career Advice for College Students in the AI Era - Traditional hard work is incomplete; direction is crucial for success, emphasizing a shift towards vector quantities in career planning [1][2] - Focus on "why" and desired outcomes before acting, aligning actions with career goals [2] - The speaker shares personal experiences and advice on navigating the tech landscape, particularly in AI [2] Key Trends in AI - The AI sector is currently dominated by Business-to-Customer (B2C) companies, such as lovable, Bolt, perplexity, and OpenAI [2] - A critical missing element in the AI boom is infrastructure to support its growth [2][3] - Infrastructure and distribution are key areas for college students to focus on to gain an advantage [3] Technical Skills and Opportunities - Quantization, a technique to run heavy systems on fewer resources, is crucial due to upcoming resource shortages in AI [5][9] - Learning AI infrastructure, especially quantization, requires dedicated effort (6-10 months) to understand the underlying engine [9][10] - Distribution, or effectively reaching people, is vital in today's AI landscape where creation is easier [10][11] Personal Branding and Networking - Actively engage on social media platforms like LinkedIn to build a personal brand and leverage distribution [15] - Asymmetrical returns are possible through distribution, where efforts can lead to disproportionately large outcomes [15][16] - Off-campus efforts are essential to stand out, as relying solely on college placements is insufficient due to the large number of students [17]
X @Avi Chawla
Avi Chawla· 2025-10-18 06:31
Model Quantization - Keras enables model quantization with a single line of code [1] - Supports quantization to int4, int8, float8, and GPTQ modes [1] - Can quantize user's own models or pre-trained models from KerasHub [1]
X @Polyhedra
Polyhedra· 2025-09-25 12:00
6/Currently working on Gemma3 quantization, focusing on:- Learning the new model architecture- Adding KV cache support (which accelerates inference)- Implementing quantization support for some new operators-- Full operator support will require 1+ additional day, plus more time for accuracy testingStay tuned for more updates 🔥 ...
X @Avi Chawla
Avi Chawla· 2025-08-14 06:33
Model Capabilities - Voyage-context-3 支持 2048, 1024, 512 和 256 维度,并具备量化功能 [1] Cost Efficiency - Voyage-context-3 (int8, 2048 维度) 相比 OpenAI-v3-large (float, 3072 维度) 降低了 83% 的向量数据库成本 [1] Performance - Voyage-context-3 提供了 860% 更好的检索质量 [1]
360Brew: LLM-based Personalized Ranking and Recommendation - Hamed and Maziar, LinkedIn AI
AI Engineer· 2025-07-16 17:59
Model Building and Training - LinkedIn leverages large language models (LLMs) for personalization and ranking tasks, aiming to use one model for all tasks [2][3] - The process involves converting user information into prompts, a method called "promptification" [8] - LinkedIn builds a large foundation model, Blue XL, with 150 billion parameters, then distills it to smaller, more efficient models like a 3B model for production [12] - Distillation from a large model is more effective than training a small model from scratch [14] - Increasing data, model size (up to 8x22B), and context length can improve model performance, but longer contexts may require model adjustments [17][18][19] Model Performance and Generalization - The model improves performance for cold start users, showing a growing gap compared to production models as interactions decrease [21] - The model demonstrates generalization to new domains, performing on par with or better than task-specific production models in out-of-domain tasks [23] Model Serving and Optimization - LinkedIn focuses on model specification, pruning, and quantization to improve throughput and reduce latency for production [26] - Gradual pruning and distillation are more effective than aggressive pruning, minimizing information loss [29][30] - Mixed precision, including FP8 for activations and model parameters but FP32 for the LM head, is crucial for maintaining prediction precision [31][32] - Sparsifying attention scores can reduce latency by allowing multiple item recommendations without each item attending to each other [34][35] - LinkedIn achieved a 7x reduction in latency and a 30x increase in throughput per GPU through these optimization techniques [36]
X @Avi Chawla
Avi Chawla· 2025-06-11 06:30
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):A great tool to estimate how much VRAM your LLMs actually need.Alter the hardware config, quantization, etc., and get to know about:- Generation speed (tokens/sec)- Precise memory allocation- System throughput, etc.No more VRAM guessing! https://t.co/lZbIink12f ...
X @Avi Chawla
Avi Chawla· 2025-06-11 06:30
A great tool to estimate how much VRAM your LLMs actually need.Alter the hardware config, quantization, etc., and get to know about:- Generation speed (tokens/sec)- Precise memory allocation- System throughput, etc.No more VRAM guessing! https://t.co/lZbIink12f ...