LSTM
Search documents
LeCun哈萨比斯神仙吵架,马斯克也站队了
量子位· 2025-12-25 00:27
一水 发自 凹非寺 量子位 | 公众号 QbitAI 吵起来了。 图灵奖得主和诺贝尔奖得主,为了"智能的本质"——直接激情友好地交流上了。 AI三巨头之一、图灵奖得主Yann LeCun明确表示: 纯粹就是胡扯(complete BS)。 而诺贝尔奖得主、谷歌DeepMind CEO哈萨比斯也不留情面了,指名道姓回击: LeCun的说法简直是大错特错。 当然,马斯克的站队可能也有别的原因。毕竟他和LeCun素来不是很对付,跟哈萨比斯则亦师亦友——马斯克还是哈萨比斯DeepMind早期投 资人。 论战之激烈,关注度之高,已经让专门开辟了一个话题板块: 马斯克也跑来吃瓜了—— 没有任何多余的解释,但这波他站哈萨比斯——"Demis is right"。 事情还要从LeCun几天前接受的一场采访说起。 他在节目中言辞犀利地指出: 根本不存在所谓的"通用智能",纯粹就是胡扯(complete BS) 。 这个概念毫无意义,因为它实际上是用来指代人类水平的智能,但人类智能其实是高度专业化的。我们在现实世界里确实干得不错,比 如认个路、导航blabla;也特别擅长跟人打交道,因为咱们进化了这么多年就是干这个的。 但在国际 ...
MIT最新发现:这十年,算法进步被高估了
机器之心· 2025-12-11 02:47
Core Insights - The article discusses the significant advancements in AI driven by increased computational budgets and algorithmic innovations over the past decade [2][6] - It highlights that while computational growth is measurable, the quantification of algorithmic progress remains unclear, particularly regarding the efficiency improvements and their scalability [2][3] Group 1: Algorithmic Progress - Research estimates that algorithmic advancements have contributed over 4 orders of magnitude in effective compute over the past decade, while computational scale itself has increased by 7 orders of magnitude [2] - The overall efficiency of models has improved by approximately 22,000 times due to algorithmic innovations, allowing for similar performance with significantly fewer floating-point operations (FLOPs) [3][4] - Most algorithmic innovations yield only minor efficiency improvements, with less than 10 times overall efficiency gain when extrapolated to 2025's computational limits [4][11] Group 2: Scale-Dependent Innovations - Two major scale-dependent algorithmic innovations, from LSTM to Transformer and from Kaplan to Chinchilla, account for 91% of the total efficiency improvements [4][22] - The efficiency gains from algorithmic improvements are significantly larger in large-scale models compared to small-scale models, indicating that algorithmic progress is heavily reliant on computational scale [6][25] - The article suggests that the perceived rapid progress in algorithms may be more a reflection of increasing computational budgets rather than continuous algorithmic breakthroughs [22][24] Group 3: Experimental Findings - The study employed various methods, including ablation studies and scaling experiments, to analyze the impact of individual algorithms and their combinations [5][8] - The findings reveal a highly skewed distribution of efficiency improvements, with a few key innovations contributing disproportionately to overall gains [11][12] - The scaling experiments demonstrate that improvements in neural network architectures are not scale-invariant but exhibit increasing returns to scale [20][21]
被拒≠失败!这些高影响力论文都被顶会拒收过
机器之心· 2025-12-11 02:47
Core Insights - Waymo has released a deep blog detailing its AI strategy centered around its foundational model, emphasizing the use of distillation methods to create efficient models for onboard operations [1] - Jeff Dean highlighted the significance of knowledge distillation in AI, reflecting on its initial rejection by NeurIPS 2014, which underestimated its potential impact [3][4] Group 1: Historical Context of Rejected Papers - Many foundational technologies in AI, such as optimizers for large models and computer vision techniques, were initially rejected by top conferences, showcasing a systemic lag in recognizing groundbreaking innovations [6] - Notable figures in AI, including Geoffrey Hinton and Yann LeCun, faced rejection for their pioneering work, often due to reasons that seem absurd in hindsight, such as claims of lacking theoretical basis or being overly simplistic [6] Group 2: Specific Case Studies of Rejected Innovations - LSTM, a milestone in handling sequential data, was rejected by NIPS in 1996 during a period when statistical methods were favored, only to later dominate fields like speech recognition [8] - The SIFT algorithm, which ruled the computer vision domain for 15 years, faced rejection from ICCV and CVPR due to its perceived complexity and lack of elegance, ultimately proving the value of robust engineering design [11] - Dropout, a key regularization method for deep neural networks, was rejected by NIPS in 2012 for being too radical, yet it became crucial for the success of models like AlexNet [17] - Word2Vec, despite its revolutionary impact on NLP, received a strong rejection at ICLR 2013 due to perceived lack of scientific rigor, but it quickly became a cornerstone of text representation [19][20] Group 3: Reflection on Peer Review Limitations - The peer review system often struggles to recognize disruptive innovations, leading to a "simplicity trap" where reviewers equate mathematical complexity with research contribution [40] - Reviewers tend to maintain existing paradigms, which can hinder the acceptance of novel ideas that challenge traditional metrics of success [40] - The demand for rigorous theoretical proof in an experimental field like deep learning can stifle practical breakthroughs, as seen with the initial skepticism towards methods like Adam optimizer [40] Group 4: Broader Implications - The experiences of rejected papers illustrate the nonlinear nature of scientific progress, highlighting that peer review, while essential, is limited by human cognitive biases [41] - Historical anecdotes, such as Einstein's rejection of a paper on gravitational waves, emphasize that the true measure of a research's impact is its long-term relevance rather than immediate acceptance [42][44]
LSTM之父向何恺明开炮:我学生才是残差学习奠基人
量子位· 2025-10-19 06:10
Core Viewpoint - The article discusses the historical context and contributions of Sepp Hochreiter and Jürgen Schmidhuber in the development of residual learning and its impact on deep learning, emphasizing that the concept of residual connections was introduced by Hochreiter in 1991, long before its popularization in ResNet [3][12][26]. Group 1: Historical Contributions - Sepp Hochreiter systematically analyzed the vanishing gradient problem in his 1991 doctoral thesis and proposed the use of recurrent residual connections to address this issue [3][12]. - The core idea of recurrent residual connections involves a self-connecting neuron with a fixed weight of 1.0, allowing the error signal to remain constant during backpropagation [13][14]. - The introduction of LSTM in 1997 by Hochreiter and Schmidhuber built upon this foundational concept, enabling effective long-term dependency learning in tasks such as speech and language processing [18][19]. Group 2: Evolution of Residual Learning - The Highway network, introduced in 2015, successfully trained deep feedforward networks with hundreds of layers by incorporating the gated residual concept from LSTM [23]. - ResNet, which gained significant attention in the same year, utilized residual connections to stabilize error propagation in deep networks, allowing for the training of networks with hundreds of layers [24][26]. - Both Highway networks and ResNet share similarities with the foundational principles established by Hochreiter in 1991, demonstrating the enduring relevance of his contributions to deep learning [26]. Group 3: Ongoing Debates and Recognition - Jürgen Schmidhuber has publicly claimed that various architectures, including AlexNet, VGG Net, GANs, and Transformers, were inspired by his lab's work, although these claims have not been universally accepted [28][31]. - The ongoing debate regarding the attribution of contributions in deep learning highlights the complexities of recognizing foundational work in a rapidly evolving field [10][32].