Workflow
transformer架构
icon
Search documents
Transformer 在具身智能“水土不服”,大模型强≠机器人强
3 6 Ke· 2025-06-18 11:55
Core Insights - The year 2025 is anticipated to be the "Year of Embodied Intelligence," driven by significant events and advancements in robotics and AI technologies [1] - There is a growing interest and investment in the field of general robotics, but concerns about sustainability and potential market bubbles persist [1] - Experts are exploring the challenges and advancements in embodied intelligence, focusing on the gap between technological ideals and engineering realities [1] Group 1: Industry Trends - A surge in robotics startups and investments indicates a strong belief in the potential of general robotics [1][2] - The transition from multi-modal large models to embodied intelligence is seen as a natural evolution, requiring substantial data and infrastructure improvements [3][4] - Current AI models face limitations in multi-task scenarios, highlighting the need for better adaptability and learning mechanisms [5][6] Group 2: Technical Challenges - The high energy consumption and training costs of large models pose significant challenges for their application in robotics [4][5] - There is a notable gap between the capabilities of large models and the multi-modal sensory systems of robots, complicating their integration [6][7] - The industry is exploring both modular and end-to-end architectures for embodied intelligence, with a shift towards more unified systems [9][10] Group 3: Research and Development - Research is focused on bridging the gap between human, AI, and robotic intelligence, aiming for better collaboration and understanding [16][18] - The current state of embodied intelligence is limited, with robots primarily executing pre-defined tasks rather than understanding human needs [18][19] - Future developments may involve creating systems that can interpret human intentions directly, bypassing traditional communication methods [20][21] Group 4: Future Outlook - Experts believe that achieving true embodied intelligence will require overcoming significant technical hurdles, particularly in understanding and interacting with the physical world [23][24] - The evolution of AI architectures, particularly beyond the current Transformer models, is essential for the long-term success of embodied intelligence [24][25] - The next five to ten years are expected to be critical for advancements in both hardware and software, potentially leading to widespread adoption of household robots [31][32]
一文了解DeepSeek和OpenAI:企业家为什么需要认知型创新?
混沌学园· 2025-06-10 11:07
在这个 AI技术日新月异的时代,我们眼看着它不仅在改变生活,更在 改变着 商业规则。 AI可不只是那些冰冷的算法,它现在能像人一样思考、推理,甚至在某些方面 的表现超越了普通人 。这告诉我们,传统的技术和创新模式已经不够用了, 企业要想增长、要保持竞争力,就 需要 换个思路。 AI正在重新定义我们理解和实践商业创新的方式。 最近 , 混沌君旁听了 混沌学园的创始人李善友教授 的重磅 全新课程《认知型创新:从 OpenAI到DeepSeek》 。 这门课从企业创新的角度, 讲述了 全球两大顶尖 AI公司——OpenAI和DeepSeek是如何一步步走到今天的,他们究竟走过了怎样的创新之路。这对于我 们理解AI时代的技术创新和企业创新,提供了清晰且极具价值的路径。 教授 深挖 了 OpenAI最初的愿景和它如何对抗巨头的思路,解密大语言模型是如何诞生的 , 尤其是 AI能力如何从简单积累到惊人"涌现" 。 还 解读 了 DeepSeek如何在资源有限的情况下,走出一条"低成本高性能"的独特道路;更探讨 了 AI时代下,企业应该怎样构建一个能不断"涌现"创新的组织模式, 走向"技术领先"。 扫描下图二维码购买月卡 ...
大模型专题:大模型架构创新研究报告
Sou Hu Cai Jing· 2025-06-06 11:38
Core Insights - The report focuses on innovations in large model architectures, particularly addressing the limitations of the Transformer architecture and exploring industry pathways for improvement [1][2][7] - As model sizes increase, the secondary computational complexity of Transformers (O(n²)) leads to significant power consumption and efficiency bottlenecks in processing long sequences, prompting a demand for innovative solutions [1][2][15] - The industry is currently exploring two main paths for architectural breakthroughs: improvements to the Transformer architecture and exploration of non-Transformer architectures [1][2][7] Transformer Architecture Improvements - Improvements to the Transformer architecture focus on optimizing the Attention mechanism, Feed-Forward Network (FFN) layers, and normalization layers [1][2][18] - Techniques such as sparse attention and dynamic attention are being developed to enhance computational efficiency, while Mixture of Experts (MoE) aims to improve sparse connection efficiency in FFN layers [1][2][18] - LongRoPE and other technologies are enhancing positional encoding to better model long sequences [1][2][18] Non-Transformer Architecture Exploration - Non-Transformer architectures include new types of RNNs (e.g., RWKV, Mamba) and CNNs (e.g., Hyena Hierarchy), as well as other innovative architectures like RetNet and LFM [1][2][7] - RWKV optimizes state evolution through a generalized Delta Rule, while Mamba leverages state space models to enhance training efficiency [1][2][7] - RetNet combines state space and multi-head attention to achieve parallel computation [1][2][7] Industry Trends and Future Directions - The industry is witnessing a trend towards hybrid architectures that combine linear Transformers with non-Transformer architectures, balancing performance and efficiency [2][7] - The current phase is characterized by a peak in traditional Transformer paradigms and an impending wave of architectural innovations, with significant focus on new RNN/CNN theoretical breakthroughs and practical engineering optimizations [2][7] - Companies like ByteDance and Alibaba are accelerating their investments in hybrid architectures, driving the evolution of large models towards higher efficiency and lower energy consumption [2][7]
三位顶流AI技术人罕见同台,谈了谈AI行业最大的「罗生门」
3 6 Ke· 2025-05-28 11:59
Core Insights - The AI industry is currently experiencing a significant debate over the effectiveness of pre-training models versus first principles, with notable figures like Ilya from OpenAI suggesting that pre-training has reached its limits [1][2] - The shift from a consensus-driven approach to exploring non-consensus methods is evident, as companies and researchers seek innovative solutions in AI [6][7] Group 1: Industry Trends - The AI landscape is witnessing a transition from a focus on pre-training to exploring alternative methodologies, with companies like Sand.AI and NLP LAB leading the charge in applying multi-modal architectures to language and video models [3][4] - The emergence of new models, such as Dream 7B, demonstrates the potential of applying diffusion models to language tasks, outperforming larger models like DeepSeek V3 [3][4] - The consensus around pre-training is being challenged, with some experts arguing that it is not yet over, as there remains untapped data that could enhance model performance [38][39] Group 2: Company Perspectives - Ant Group's Qwen team, led by Lin Junyang, has faced criticism for being conservative, yet they emphasize that their extensive experimentation has led to valuable insights, ultimately reaffirming the effectiveness of the Transformer architecture [5][15] - The exploration of Mixture of Experts (MoE) models is ongoing, with the team recognizing the potential for scalability while also addressing the challenges of training stability [16][20] - The industry is increasingly focused on optimizing model efficiency and effectiveness, with a particular interest in achieving a balance between model size and performance [19][22] Group 3: Technical Innovations - The integration of different model architectures, such as using diffusion models for language generation, reflects a broader trend of innovation in AI [3][4] - The challenges of training models with long sequences and the need for effective optimization strategies are critical areas of focus for researchers [21][22] - The potential for future breakthroughs lies in leveraging increased computational power to revisit previously unviable techniques, suggesting a cycle of innovation driven by advancements in hardware [40][41]
自动驾驶未来技术趋势怎样?李想:现阶段VLA是能力最强的架构
news flash· 2025-05-07 13:27
Core Viewpoint - The CEO of Li Auto, Li Xiang, discussed the transition of the auxiliary driving system to the VLA architecture, questioning its efficiency compared to potential future architectures [1] Group 1 - VLA architecture is capable of addressing full autonomous driving, but its efficiency as the optimal solution is uncertain [1] - Li Xiang highlighted that VLA is still based on the transformer architecture, which raises questions about whether transformer is the most efficient architecture available [1] - Currently, VLA is considered the most powerful architecture in terms of capabilities [1]
深度|对话Cerebras CEO:3-5年后我们对Transformer依赖程度将降低,英伟达市占率将降至50-60%
Z Potentials· 2025-04-06 04:55
图片来源: 20VC with Harry Stebbings Z Highlights Andrew Feldman 是 Cerebras 的联合创始人兼首席执行官, Cerebras 是世界上最快的人工智能推理 + 训练平台。本次访谈为他和 20VC 主播 Harry Stebbings 探讨 AI 时代改变芯片构造需求以及行业趋势。 AI 对芯片需求的改变 Harry : 见到你真是太高兴了。我期待这次对话很久了。 Eric 经常向我提起你,一直对你赞不绝口,非常感谢你能接受我的访谈。 Andrew : Harry ,谢谢邀请。很荣幸能参与这个对话。 Harry : 这一定会是场精彩的对话,感觉今天能跟你学到很多。让我们回到 2015 年,当时你和团队在 AI 领域看到了什么机遇,促使你们创立了 Cerebras 公司? Andrew : 我们看到了一种新兴工作负载的崛起 —— 这对计算机架构师而言堪称梦想成真。我们发现了一个值得解决的新问题,这意味着或许可以为此打 造更适配的硬件系统。 2015 年时,我的联合创始人 Gary 、 Sean 、 JP 和 Michael 率先预见了 AI 的兴起。这预 ...
湖南95后女博士,力挑谷歌,要造思考时"不发烧"的AI
创业邦· 2025-03-19 09:28
Core Viewpoint - Lu Xi Technology aims to challenge the dominance of the Transformer architecture in AI by developing a brain-like computing ecosystem, introducing the NLM model that significantly reduces energy consumption while enhancing inference efficiency [2][3][4]. Group 1: Company Overview - Lu Xi Technology was founded in 2023 by two women born in the 1990s, marking it as the first domestic company focused on brain-like computing [2]. - The NLM model, launched in 2024, is the first domestically developed large model using a non-Transformer architecture based on brain-like technology [2][12]. - The company has received approval from the National Internet Information Office for its generative AI services and deep synthesis algorithm services [2][12]. Group 2: Technology and Innovation - The NLM model boasts a reduction in energy consumption by over 80% while improving inference efficiency several times compared to traditional models [12][13]. - Lu Xi Technology's brain-like architecture mimics the human brain's neural structure, allowing for efficient computation and storage by activating only relevant neurons [4][12]. - The company is developing a range of products based on the NEURARK brain-like architecture, including foundational models and industry-specific models, to meet diverse market needs [12][15]. Group 3: Market Position and Strategy - Lu Xi Technology aims to break the dependency on NVIDIA chips by developing its own FPGA and ASIC chips tailored for large models [10][12]. - The company collaborates with various state-owned enterprises and industry leaders to deploy its models across multiple sectors, including healthcare and disaster management [15]. - The company is targeting a significant increase in model parameter scale, aiming to reach 600 billion parameters by 2025, which would bring it closer to the complexity of the human brain [16].