Workflow
Glyph
icon
Search documents
「我受够了Transformer」:其作者Llion Jones称AI领域已僵化,正错失下一个突破
3 6 Ke· 2025-10-26 23:24
Core Insights - Llion Jones, co-author of the influential paper "Attention is All You Need," expressed his fatigue with the Transformer architecture at the recent TED AI conference, highlighting a stagnation in AI research due to an over-reliance on a single framework [2][20]. Group 1: Current State of AI Research - Despite unprecedented investment and talent influx in AI, the field has become narrow-minded, potentially overlooking the next major breakthrough [2][8]. - Researchers are under pressure to publish quickly and avoid being "scooped," leading to a preference for safe, easily publishable projects over high-risk, transformative ideas [8][11]. - Jones noted that the current environment is reminiscent of the period before the introduction of the Transformer, where researchers were focused on minor improvements to RNNs, missing out on significant innovations [11][16]. Group 2: The Role of Freedom in Innovation - Jones emphasized that the Transformer was born from a free and organic research environment, contrasting sharply with today's pressure-laden atmosphere [12][14]. - He suggested that fostering an exploratory research environment, where researchers can take risks without the fear of immediate results, is crucial for future breakthroughs [13][19]. - At Sakana AI, Jones aims to recreate the conditions that led to the creation of the Transformer, minimizing competitive pressures and encouraging innovative thinking [14][15]. Group 3: Implications for Future AI Development - Jones warned that the success of the Transformer might be hindering the search for better technologies, as the current capabilities discourage exploration of alternatives [16][20]. - He called for a shift in the incentive structures within the AI research community to prioritize collaboration and shared discoveries over competition [18][19]. - The ongoing debate about the limitations of simply scaling Transformer models suggests that architectural innovation is necessary for continued progress in AI [19][20].
「我受够了Transformer」:其作者Llion Jones称AI领域已僵化,正错失下一个突破
机器之心· 2025-10-25 03:20
Core Viewpoint - The AI field is experiencing a paradox where increased resources and funding are leading to decreased creativity and innovation, as researchers focus on safe, publishable projects rather than high-risk, transformative ideas [3][11][29]. Group 1: Current State of AI Research - Llion Jones, CTO of Sakana AI and co-author of the influential paper "Attention is All You Need," expressed frustration with the current focus on the Transformer architecture, suggesting it may hinder the search for the next major breakthrough [2][5][24]. - Despite unprecedented investment and talent influx into AI, the field has become narrow-minded, with researchers feeling pressured to compete rather than explore new ideas [3][11][16]. - Jones highlighted that the current environment leads to rushed publications and a lack of true scientific exploration, as researchers are concerned about being "scooped" by competitors [11][16]. Group 2: Historical Context and Comparison - Jones recalled the organic and pressure-free environment that led to the creation of the Transformer, contrasting it with today's competitive atmosphere where researchers feel compelled to deliver quick results [19][30]. - He emphasized that the freedom to explore ideas without pressure from management was crucial for the development of the Transformer, a condition that is now largely absent [19][22]. Group 3: Proposed Solutions and Future Directions - To foster innovation, Jones proposed increasing the "exploration dial" and encouraging researchers to share their findings openly, even at the cost of competition [21][26]. - At Sakana AI, efforts are being made to recreate a research environment that prioritizes exploration over competition, aiming to reduce the pressure to publish [22][30]. - Jones believes that the next significant breakthrough in AI may be overlooked if the current focus on incremental improvements continues, urging a shift towards collaborative exploration [26][31].
智谱运气是差一点点,视觉Token研究又和DeepSeek撞车了
量子位· 2025-10-22 15:27
Core Viewpoint - The article discusses the competition between Zhipu and DeepSeek in the AI field, particularly focusing on the release of Zhipu's visual token solution, Glyph, which aims to address the challenges of long context in large language models (LLMs) [1][2][6]. Group 1: Context Expansion Challenges - The demand for long context in LLMs is increasing due to various applications such as document analysis and multi-turn dialogues [8]. - Expanding context length significantly increases computational costs; for instance, increasing context from 50K to 100K tokens can quadruple the computational consumption [9][10]. - Merely adding more tokens does not guarantee improved model performance, as excessive input can lead to noise interference and information overload [12][14]. Group 2: Existing Solutions - Three mainstream solutions to the long context problem are identified: 1. **Extended Position Encoding**: This method extends the existing position encoding range to accommodate longer inputs without retraining the model [15][16]. 2. **Attention Mechanism Modification**: Techniques like sparse and linear attention aim to improve token processing efficiency, but do not reduce the total token count [20][21]. 3. **Retrieval-Augmented Generation (RAG)**: This approach uses external retrieval to shorten inputs, but may slow down overall response time [22][23]. Group 3: Glyph Framework - Glyph proposes a new paradigm by converting long texts into images, allowing for higher information density and efficient processing by visual language models (VLMs) [25][26]. - By using visual tokens, Glyph can significantly reduce the number of tokens needed; for example, it can represent the entire text of "Jane Eyre" using only 80K visual tokens compared to 240K text tokens [32][36]. - The training process for Glyph involves three stages: continual pre-training, LLM-driven rendering search, and post-training, which collectively enhance the model's ability to interpret visual information [37][44]. Group 4: Performance and Results - Glyph achieves a token compression rate of 3-4 times while maintaining accuracy comparable to mainstream models [49]. - The implementation of Glyph results in approximately four times faster prefill and decoding speeds, as well as two times faster supervised fine-tuning (SFT) training [51]. - Glyph demonstrates strong performance in multimodal tasks, indicating its robust generalization capabilities [53]. Group 5: Contributors and Future Implications - The primary author of the paper is Jiale Cheng, a PhD student at Tsinghua University, with contributions from Yusen Liu, Xinyu Zhang, and Yulin Fei [57][62]. - The article suggests that visual tokens may redefine the information processing methods of LLMs, potentially leading to pixels replacing text as the fundamental unit of AI input [76][78].
X @OpenSea
OpenSea· 2025-10-10 19:00
NFT Platform Updates - Yuga Labs officially launched Glyph on OpenSea [1] - The launch encourages users to engage with the platform ("Sign in Ape in") [1]