Neural Network

Search documents
X @Avi Chawla
Avi Chawla· 2025-08-25 06:30
I removed 74% of neurons from a neural network.It dropped the accuracy by just 0.50%.Here's a breakdown (with code): ...
The AlphaGO Moment for AI Models...
Matthew Berman· 2025-07-31 18:08
AI Model Architecture Discovery - The AI field is approaching an era where AI can discover new knowledge and apply it to itself, potentially leading to exponential innovation [1][3] - The current bottleneck in AI discovery is human innovation, limiting the scaling of AI advancements [2][3] - The "AlphaGo moment" for model architecture discovery involves AI self-play to hypothesize, code, test, and analyze new model architectures [3][12] - The key to this approach is AI's ability to learn without human input, discovering novel solutions unconstrained by human biases [8] ASI Arch System - The ASI Arch system uses a researcher, engineer, and analyst to autonomously propose, implement, test, and analyze new neural network architectures [13][14][15][16] - The system learns from past experiments and human literature to propose new architectures, selecting top performers as references [14] - The engineer component self-heals code to ensure new approaches are properly tested [15] - The analyst reviews results, learns insights, and maintains a memory of lessons learned for future generations of models [16] Experimental Results and Implications - The system ran 1,700 autonomous experiments over 20,000 GPU hours, resulting in 106 models that outperformed previous public models [17][18] - The potential for exponential improvement exists by increasing compute resources, such as scaling from 20,000 to 20 million GPU hours [19] - The self-improving AI system can be applied to other scientific fields like biology and medicine by increasing compute resources [20] - The open-sourced paper and code have significant implications, with multiple companies publishing similar self-improving AI papers [21]
Why GPT-4.5 Failed
Matthew Berman· 2025-07-03 16:04
Model Performance - GPT 4.5% is considered much smarter than previous versions, specifically 40 and 4.1% [1] - Despite its intelligence, GPT 4.5% is deemed not very useful due to being too slow and expensive [1] - Overparameterization caused GPT 4.5% to memorize data excessively during initial training, hindering generalization [2] Development Challenges - OpenAI encountered a bug within PyTorch during GPT 4.5%'s development, which they identified and fixed [2] - The bug fix on GitHub received positive reactions from approximately 20 OpenAI employees [3]
Google首席科学家万字演讲回顾AI十年:哪些关键技术决定了今天的大模型格局?
机器人圈· 2025-04-30 09:10
Google 首席科学家Jeff Dean 今年4月于在苏黎世联邦理工学院发表关于人工智能重要趋势的演讲,本次演讲回顾 了奠定现代AI基础的一系列关键技术里程碑,包括神经网络与反向传播、早期大规模训练、硬件加速、开源生 态、架构革命、训练范式、模型效率、推理优化等。算力、数据量、模型规模扩展以及算法和模型架构创新对AI 能力提升的关键作用。 以下是本次演讲 实录 经数字开物团队编译整理 01 AI 正以前所未有的规模和算法进步改变计算范式 Jeff Dean: 今天我将和大家探讨 AI 的重要趋势。我们会回顾:这个领域是如何发展到今天这个模型能力水平的?在当前的技 术水平下,我们能做些什么?以及,我们该如何塑造 AI 的未来发展方向? 这项工作是与 Google 内外的众多同仁共同完成的,所以并非全是我个人的成果,其中许多是合作研究。有些工作 甚至并非由我主导,但我认为它们都非常重要,值得在此与大家分享和探讨。 我们先来看一些观察发现,其中大部分对在座各位而言可能显而易见。首先,我认为最重要的一点是,机器学习 彻底改变了我们对计算机能力的认知和期待。回想十年前,当时的计算机视觉技术尚处初级阶段,计算机几乎谈 ...