Workflow
Latency
icon
Search documents
X @Andy
Andy· 2025-07-21 15:21
Crypto Adoption Bottlenecks - Latency and economic incentives, rather than just TPS (transactions per second), are identified as key bottlenecks hindering crypto adoption [1] Monad's Vision - Monad's vision was inspired by sports betting [1]
What every AI engineer needs to know about GPUs — Charles Frye, Modal
AI Engineer· 2025-07-20 07:00
AI Engineering & GPU Utilization - AI engineering is shifting towards tighter integration and self-hosting of language models, increasing the need to understand GPU hardware [6][7] - The industry should focus on high bandwidth, not low latency, when utilizing GPUs [8] - GPUs optimize for math bandwidth over memory bandwidth, emphasizing computational operations [9] - Low precision matrix matrix multiplications are key to fully utilizing GPU potential [10] - Tensor cores, specialized for low precision matrix matrix multiplication, are crucial for efficient GPU usage [6][37] Hardware & Performance - GPUs achieve parallelism significantly exceeding CPUs, with the Nvidia H100 SXM GPU capable of over 16,000 parallel threads at 5 cents per thread, compared to AMD Epic CPU's two threads per core at approximately 1 watt per thread [20][21] - GPUs offer faster context switching compared to CPUs, happening every clock cycle [23] - Bandwidth improvement increases at the square of latency improvement, favoring bandwidth-oriented hardware [25][26] Model Optimization - Small models can be more hardware-sympathetic, potentially matching the quality of larger models with techniques like verification and multiple generations [32][33] - Multi-token prediction and multi-sample queries can become nearly "free" due to tensor core capabilities [36] - Generating multiple samples or tokens can improve performance by leveraging matrix matrix operations [39]
X @Solana
Solana· 2025-07-17 14:53
RT Mike | heymike.sol 🎒🪽 (@heymike777)Increase Bandwidth, Reduce Latency ...
Optimizing inference for voice models in production - Philip Kiely, Baseten
AI Engineer· 2025-06-28 02:39
Key Optimization Goal - Aims to achieve Time To First Byte (TTFB) below 150 milliseconds for voice models [1] Technology and Tools - Leverages open-source TTS models like Orpheus, which have an LLM backbone [1] - Employs tools and optimizations such as TensorRT-LLM and FP8 quantization [1] Production Challenges - Client code, network infrastructure, and other outside-the-GPU factors can introduce latency [1] - Common pitfalls exist when integrating TTS models into production systems [1] Scalability and Customization - Focuses on scaling TTS models in production [1] - Extends the system to serve customized models with voice cloning and fine-tuning [1]
X @BREAD | ∑:
BREAD | ∑:· 2025-06-26 20:36
Performance & Scalability - The scaling journey will not be a short one [1] - Hyperliquid needs 0.5 seconds to create or update orders, which is considered slow [1] - There is a need to reduce latency in perp platforms [1] User Feedback - CBB (@Cbb0fe) points out the slow order creation/update speed on Hyperliquid [1]
Chasing Nanoseconds: The Race in Digital Asset Markets | Denis Dariotis | TEDxMarianopolisCollege
TEDx Talks· 2025-06-18 16:09
Market Dynamics & Latency - Latency, the delay in information transmission, is paramount in modern financial markets, outweighing strategy and logic [1][2] - Traditional markets evolved from manual trades to digitized exchanges with set hourly intervals, algorithms, and methodologies governing financial market operations [2][3] - High-frequency trading firms execute tens of billions of dollars daily, emphasizing the importance of minimizing latency [8] Crypto Market Challenges & Opportunities - Crypto markets operate 24/7/365, lacking the standardized governance of traditional markets, posing challenges for institutional adoption [6][7] - The absence of standardized trading infrastructure in crypto, including collocation, normalized data feeds, and fair exchange access, hinders institutional market access [5][8][9] - Institutions with advanced, proprietary trading technologies currently dominate crypto, highlighting the need for broader access and transparency [9] - Advancing the maturity of the crypto space is critical for achieving optimal market access, better pricing for customers, and transparency [9][12] Infrastructure & Technology - Traditional firms invest in infrastructure like underground tunnels and microwave towers to reduce latency by microseconds [5] - Collocation, placing servers near exchanges, is crucial for minimizing latency and improving trading speed [8] - Standardized data feeds across exchanges and fair access across jurisdictions are essential for institutional adoption in crypto [9]