Transformers
Search documents
X @Avi Chawla
Avi Chawla· 2026-03-16 20:41
RT Avi Chawla (@_avichawla)Big release from Kimi!They just released a new way to handle residual connections in Transformers.In a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection.If you consider this across 40+ layers, the hidden state at any layer is just the equal-weighted sum of all previous layer outputs.Every layer contributes with weight=1, so every layer gets equal importance.This creates a problem called PreNorm dilut ...
X @Avi Chawla
Avi Chawla· 2026-03-16 09:17
Big release from Kimi!They just released a new way to handle residual connections in Transformers.In a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection.If you consider this across 40+ layers, the hidden state at any layer is just the equal-weighted sum of all previous layer outputs.Every layer contributes with weight=1, so every layer gets equal importance.This creates a problem called PreNorm dilution, where as the hidden st ...
Jurassic AI, Le monde perdu des agents AI | Mohamed YOUSSFI | TEDxUniversity Hassan II of Casablanca
TEDx Talks· 2026-03-02 17:17
Salut à tous, je suis venu aujourd'hui donc pour vous pour partager avec vous une métaphore euh entre une fiction qui a marqué des millions de spectateurs et une réalité qui ne cesse de nous étonner chaque chaque jour. La la fiction n'est autre que le fameux film Jurassic Park, très connu le monde perdu des dinosaures. Vous vous rappelez bien.Alors ça c'est un film qui est c'est une fiction qui est très connue. Alors rappelez-vous dans ce film des scientifiques motivés par des des résultats économiques alla ...
How Classical Thinking Shaped Modern AI | Peter Danenberg | TEDxBoston
TEDx Talks· 2026-02-02 16:40
[applause] So Peter, you know, I want to get into, you know, how you're such a renaissance guy, but maybe talk a little bit about your experience with Gemini. Like where did it start? Uh you were one of the I think first handful of people on Bard, you know, did did someone at Google get everyone together say, "Hey, we got to this AI thing is big. We got to do this. " You know, like what was it like early days? Then how did it morph into something that there's a code red uh you know across the street? >> Tha ...
The Enterprise Brain for AI Agents with Glean and Cresta
Greylock· 2026-01-20 16:02
As you uh develop more and more agents, as you take u these human-driven processes and agentify them, you have to think about like how do you bring that full comprehensive enterprise context to all all of these different agents. And ideally, we feel like AI should be like electricity. It just disappear.We don't, you know, be using AI without even knowing using AI. And it's almost like augumented reality before work. Arvin Ping, thank you so much for joining us for Greylock Change Agents.As you know, change ...
Ambarella Pitches Edge AI Pivot at Needham Conference, Ramping Transformer-Ready CV7 and N1 Chips
Yahoo Finance· 2026-01-14 01:10
Core Insights - Ambarella is transitioning from a video processing supplier for human viewing to an edge AI semiconductor company focused on machine perception and on-device inference, with a product roadmap evolving from CNN-based AI accelerators to transformer-capable architectures [4][7] - The company's second-generation AI chip family, CV2, accounts for approximately 80% of total revenue, with edge AI contributing similarly to the revenue mix [2][7] - Ambarella's strategic pivot began around 2012, focusing on silicon architectures optimized for convolutional neural networks (CNNs) and now expanding to include transformers [3][4] Product Development and Performance - The third-generation CV7 family, including CV72 and CV75, is currently ramping up production, while the CV2 line still represents a significant portion of revenue [7][8] - The low-end CV75 chip can run a ~2 billion-parameter model in real time at ~2 watts, priced around $20, while the N1 family can handle models up to 34 billion parameters, scalable to 70 billion [6][10] - Ambarella's N1 "AI box" concept aggregates multiple edge endpoints for enhanced capabilities without replacing existing hardware, with the first design win entering production in Q2 [12] Market Strategy and Growth Drivers - Ambarella is broadening its channels by engaging with GSIs and ISVs, pursuing semi-custom ASIC projects, and identifying unexpected growth in telematics/fleet management and portable video markets [5][16] - The company expects continued growth in fiscal 2027, driven by unit sales and average selling prices (ASPs), despite market uncertainties [13] - Ambarella's portfolio now includes a family of 15 AI chips with ASPs ranging from $15 to $400, with a new CV7 chip offering significantly improved performance [14][15] Industry Positioning and Future Outlook - The automotive sector is anticipated to face challenges in 2025 due to delays among Western OEMs, but Ambarella remains committed to its autonomous driving investments [17][19] - The company is exploring opportunities in edge AI applications across various sectors, including robotics and drones, while also considering licensing software for broader use cases [18] - Ambarella's long-term corporate gross margin target remains between 59% to 62%, even with the introduction of semi-custom projects [16]
¿Cuál es el motor de la ciencia? | Mario Ponce | TEDxLINTAC Youth
TEDx Talks· 2025-10-20 15:27
Historical Context & Scientific Progress - The presentation explores the driving forces behind scientific advancement, questioning whether it's curiosity, the pursuit of common good, or progress itself [1] - It highlights the contributions of lesser-known figures like Tycho Brahe, emphasizing that scientific progress is built upon the work of individuals, even those with flaws [9] - The presentation traces the evolution of understanding the solar system, from geocentric models to the heliocentric view championed by Copernicus and Galileo [4] Tycho Brahe's Significance - Tycho Brahe's meticulous data collection and development of precise astronomical instruments were crucial to scientific advancement [2] - Brahe's discovery of a supernova brought him fame and resources, but his personal vices and ethical dilemmas led to his relative obscurity [2] - Brahe's attempt to reconcile geocentric and heliocentric models resulted in a flawed model, hindering his legacy [2] The Interconnectedness of Scientific Discovery - Johannes Kepler's laws of planetary motion were derived from Tycho Brahe's data, demonstrating the importance of collaboration and building upon previous work [4][5] - Isaac Newton's laws of motion and calculus validated Kepler's laws, further solidifying the heliocentric model and revolutionizing physics [6][7][8] - The presentation draws a parallel between Tycho Brahe's data-driven approach and modern statistical inference, neural networks, and large language models (LLMs) [10][11] Ethical Considerations in Science - The presentation raises the question of whether the rigorous work of someone with questionable character can be validated [12] - It suggests that Tycho Brahe's ethical failings contributed to his being forgotten, emphasizing the importance of ethical decision-making in science [13] - The speaker posits that humanity, with its inherent flaws and virtues, is the driving force behind science, urging individuals to strive for ethical conduct to shape a better future [14][15]
X @Avi Chawla
Avi Chawla· 2025-09-20 19:41
Technology Breakthroughs - True technology breakthroughs are rare, the hype around KANs serves as a reminder [1] - Shifts like the success of Transformers only come once in a decade or more [1] Industry Dynamics - Transformers aligned with hardware, data, and economics, proving to be a significant breakthrough [1]
Vision AI in 2025 — Peter Robicheaux, Roboflow
AI Engineer· 2025-08-03 17:45
AI Vision Challenges & Opportunities - Computer vision lags behind human vision and language models in intelligence and leveraging big pre-training [3][8][11] - Current vision evaluations like ImageNet and COCO are saturated and primarily measure pattern matching, hindering the development of true visual intelligence [5][22] - Vision models struggle with tasks requiring visual understanding, such as determining the time on a watch or understanding spatial relationships in images [9][10] - Vision-language pre-training, exemplified by CLIP, may fail to capture subtle visual details not explicitly included in image captions [14][15] Rooflow's Solution & Innovation - Rooflow introduces RF DTOR, a real-time object detection model leveraging the Dinov2 pre-trained backbone to address the underutilization of large pre-trainings in visual models [20] - Rooflow created R100VL, a new dataset comprising 100 diverse object detection datasets, to better measure the intelligence and domain adaptability of visual models [24][25] - R100VL includes challenging domains like aerial imagery, microscopy, and X-rays, and incorporates visual language tasks to assess contextual understanding [25][26][27][28][29] - Rooflow's benchmark reveals that current vision language models struggle to generalize in the visual domain compared to the linguistic domain [30] - Fine-tuning a YOLO V8 nano model from scratch on 10-shot examples performs better than zero-shot Grounding DINO on R100VL, highlighting the need for improved visual generalization [30][36][37] Industry Trends & Future Directions - Transformers are proving more effective than convolutional models in leveraging large pre-training datasets for vision tasks [18] - The scale of pre-training in the vision world is significantly smaller compared to the language world, indicating room for growth [19] - Rooflow makes its platform freely available to researchers, encouraging open-source data contributions to the community [33]
Building a Smarter AI Agent with Neural RAG - Will Bryk, Exa.ai
AI Engineer· 2025-07-29 07:01
Core Problem & Solution - The presentation introduces Exa, a search engine designed for AI, addressing the limitations of traditional search engines built for human users [5][23] - Exa aims to provide an API that delivers any information from the web, catering to the specific needs of AI systems [22][41] - Exa uses transformer-based embeddings to represent documents, capturing meaning and context beyond keywords [11][12] AI vs Human Search - Traditional search engines are optimized for humans who use simple queries and want a few relevant links, while AIs require complex queries, vast amounts of knowledge, and precise, controllable information [23][24] - AI agents need search engines that can handle multi-paragraph queries, search with extensive context, and provide comprehensive knowledge [31][32][33] - Exa offers features like adjustable result numbers (10, 100, 1000), date ranges, and domain-specific searches, giving AI systems full control [44] Market Positioning & Technology - Exa launched in November 2022 and gained traction for its ability to handle complex queries that traditional search engines struggle with [15] - The company recognized the need for AI-driven search after the emergence of ChatGPT, realizing that LLMs need external knowledge sources [17][18] - Exa combines neural and keyword search methods to provide comprehensive results, allowing agents to use different search types based on the query [47][48] Future Development - Exa is developing a "research endpoint" that uses multiple searches and LLM calls to generate detailed reports and structured outputs [51] - The company envisions a future where AI agents have full access to the world's information through a versatile search API [48] - Exa aims to handle a wider range of queries, including semantic and complex ones, turning the web into a controllable database for AI systems [38][39][40]