Workflow
Transformer架构
icon
Search documents
告别Transformer,重塑机器学习范式:上海交大首个「类人脑」大模型诞生
机器之心· 2025-08-13 09:29
Core Viewpoint - The article discusses the introduction of BriLLM, a new language model inspired by human brain mechanisms, which aims to overcome the limitations of traditional Transformer-based models, such as high computational demands, lack of interpretability, and context size restrictions [3][8]. Group 1: Limitations of Current Models - Current Transformer-based models face three main issues: high computational requirements, black-box interpretability, and context size limitations [6][8]. - The self-attention mechanism in Transformers has a time and space complexity of O(n²), leading to increased computational costs as input length grows [7]. - The internal logic of Transformers lacks transparency, making it difficult to understand the decision-making process within the model [7][8]. Group 2: Innovations of BriLLM - BriLLM introduces a new learning mechanism called SiFu (Signal Fully-connected Flowing), which replaces traditional prediction operations with signal transmission, mimicking the way neural signals operate in the brain [9][13]. - The model architecture is based on a directed graph, allowing all nodes to be interpretable, unlike traditional models that only provide limited interpretability at the input and output layers [9][19]. - BriLLM supports unlimited context processing without increasing model parameters, allowing for efficient handling of long sequences [15][16]. Group 3: Model Specifications - BriLLM has two versions: BriLLM-Chinese and BriLLM-English, with non-sparse model sizes of 16.90 billion parameters for both languages [21]. - The sparse version of the Chinese model has 2.19 billion parameters, while the English version has 0.96 billion parameters, achieving a parameter reduction of approximately 90% [21]. - The model's design allows for the integration of multiple modalities, enabling it to process not just language but also visual and auditory inputs [25][26]. Group 4: Future Prospects - The team aims to develop a multi-modal brain-inspired AGI framework, which will integrate perception and motion [27]. - BriLLM has been selected for funding under Shanghai Jiao Tong University's "SJTU 2030" plan, which supports groundbreaking research projects [27].
深聊GPT-5发布:过度营销的反噬与AI技术突破的困局
Hu Xiu· 2025-08-12 09:05
Core Insights - GPT-5 has been released, but it does not represent a significant step towards Artificial General Intelligence (AGI) [1] - The launch event revealed several issues, including presentation errors and reliance on debunked theories, which highlighted weaknesses in the Transformer architecture [1] - Despite these shortcomings, GPT-5 is still considered a competent AI product, and OpenAI plans to implement aggressive commercialization strategies in key sectors [1] Technical Development - The development of GPT-5 faced various technical bottlenecks, leading to the choice of a specific architecture to overcome these challenges [1] - The limitations of the Scaling law have been encountered, raising questions about future technological pathways for AI advancement [1] Commercial Strategy - OpenAI aims to rapidly establish a presence in three main application areas: education, healthcare, and programming [1] - The company's approach suggests a focus on leveraging GPT-5's capabilities to solidify its market position [1]
国泰海通|产业:AI Agent的技术演进与产业洞察
Core Insights - The evolution of AI Agents is fundamentally driven by the paradigm shift towards large language models (LLMs) as the "brain," showcasing commercial value through vertical applications that address specific industry pain points and high precision [1][2] - AI Agents are reshaping software development and human-computer interaction, transitioning from traditional architectures to modern LLM-based frameworks that enable autonomous planning, environmental perception, and tool invocation [1][2] Technical Evolution - The core of AI Agent's technological advancement lies in the significant changes introduced by modern LLM architectures, moving away from traditional architectures that were limited by hardware and pre-programmed rules [2] - The modern LLM-based Agent architecture consists of three main modules: brain, perception, and action, allowing multiple specialized agents to collaborate or compete to overcome the limitations of single agents in handling complex tasks [2] Industry Chain Formation - A complete industry chain is emerging with upstream dominated by a few tech giants providing foundational models and computing power, while the midstream sees the rise of open-source frameworks and platforms that lower development barriers [3] - Downstream applications are categorized into general-purpose agents for complex multi-step tasks and vertical agents deeply integrated with industry knowledge, showing significant commercial value in sectors like software development, law, finance, and healthcare [3] Challenges and Future Trajectory - Despite rapid advancements, AI Agents face challenges such as limitations in LLM's planning and reasoning capabilities, context window constraints, memory bottlenecks, multi-agent collaboration issues, and evaluation dilemmas [3] - The future development of AI Agents will depend on the continuous evolution of foundational LLMs, the proliferation of multimodal perception capabilities, and the restructuring of the software and hardware ecosystem, moving closer to AGI [3]
GPT-5 之后,我们离 AGI 更近了,还是更远了?
3 6 Ke· 2025-08-08 07:10
Core Insights - The release of GPT-5 marks a significant evolution in AI capabilities, transitioning from a focus on conversation to practical applications, described as a "philosophical revolution" in architecture [4][6] - OpenAI aims to unify its models into a single intelligent system, eliminating the previous "model zoo" and enhancing user experience through a real-time routing mechanism [5][6] - Despite the excitement surrounding GPT-5, there are mixed reactions from users, with some praising its capabilities while others express disappointment, particularly in writing tasks [30][21] Group 1: Model Features and Architecture - GPT-5 introduces a unified intelligent system with a fast model for general queries and a deep reasoning model for complex problems, managed by a real-time router [5][6] - The model supports a maximum input of 272,000 tokens and an output limit of 128,000 tokens, accommodating both text and image inputs [5] - OpenAI has declared the end of older models, positioning GPT-5 as a highly coordinated and unified AI entity [6] Group 2: Performance and User Experience - Initial benchmark tests showed promising results for GPT-5, but there were discrepancies in data presentation during the launch event, raising questions about its reliability [11][12] - Users have reported that while GPT-5 excels in programming tasks, its writing capabilities do not match those of previous models like GPT-4.5, leading to a divide in user satisfaction [18][21] - OpenAI has implemented new safety measures to reduce hallucinations and improve task reliability, although challenges remain in addressing prompt injection attacks [27][29] Group 3: Market Strategy and Pricing - OpenAI's pricing strategy for GPT-5 is aggressive, with costs set at $1.25 per million input tokens, significantly lower than competitors, indicating a strategy to capture market share [17][16] - The release of GPT-5 coincides with a surge in developer interest and new tools, suggesting a potential shift in the AI development landscape [14][30] - The competitive pricing and enhanced capabilities position GPT-5 as a strong contender in the AI market, particularly for developers seeking reliable tools [16][30]
明显感觉程序员的面试已经变了。。
猿大侠· 2025-07-23 03:25
Core Viewpoint - The article emphasizes the importance of integrating existing programming skills with large model technologies to enhance career prospects in the AI field, rather than abandoning current skills [1]. Summary by Sections Course Overview - A course titled "Large Model Application Development Practical Training" is designed to help developers master AI application development from scratch through practical projects and code breakdowns [1]. - The course includes insights from industry experts and real case studies from major companies, providing participants with high-paying job opportunities and internal referrals [1][15]. Course Content - The curriculum covers essential concepts such as RAG (Retrieval-Augmented Generation), AI Agent, and Transformer architecture, focusing on practical applications and fine-tuning techniques [9][11]. - It consists of five modules: basics, tools, advanced topics, competitions, and practical applications, ensuring a comprehensive learning path [9]. Target Audience - The course is aimed at developers looking to connect with product teams, build technical barriers, avoid job insecurity, and enhance their skills for future career development [13]. - It is particularly relevant for programmers concerned about job stability as they age, especially those nearing the 35-year mark [13]. Success Metrics - The course has successfully served over 20,000 students, receiving positive feedback and helping many secure high-paying job offers [11]. - Participants learn to customize models for specific industries such as manufacturing, healthcare, and finance, improving task accuracy and efficiency [11]. Practical Experience - The course includes detailed case studies of popular AI applications, allowing participants to gain hands-on experience and build a portfolio of practical projects [16]. - Students will learn to implement AI technologies in various business scenarios, enhancing their employability [16]. Career Development - The course offers insights into current job market trends for large model technologies, including salary expectations and career growth opportunities [20]. - Continuous internal referral opportunities are provided, ensuring participants have a direct pathway to high-paying positions in leading companies [20].
最近,程序员的招聘市场已经疯掉了。。。
程序员的那些事· 2025-07-22 03:48
Core Viewpoint - The article emphasizes the importance of integrating existing programming skills with large model technologies to enhance career prospects and salary opportunities in the AI field [1]. Group 1: Course Offerings - A course titled "Large Model Application Development Practical Training" is designed to help developers master the complete AI application development process through practical projects and code breakdown [1]. - The course covers essential technologies such as RAG, AI Agent, and Transformer architecture, providing a comprehensive learning path from basics to advanced applications [8]. - The course has served over 20,000 students and has received positive feedback, with many participants securing high-paying job offers [10]. Group 2: Learning Outcomes - Participants will learn to fine-tune mainstream large models like DeepSeek and Qwen for specific scenarios, improving model performance and task accuracy [10]. - The course includes practical applications of RAG technology for efficient knowledge retrieval and generation in various sectors such as law, healthcare, and finance [10]. - Students will also learn to design and develop AI Agents for multi-task collaboration and complex problem-solving in industry-specific contexts [10]. Group 3: Career Development - The course aims to help participants build technical barriers, avoid job insecurity, and enhance their career development over the next 20 years [12]. - It offers insights into current job market trends, salary expectations, and career paths from the perspective of hiring managers [19]. - The program provides reliable internal referral opportunities and direct hiring benefits, facilitating quicker access to high-paying job offers [19].
就业市场跌爆了。。
菜鸟教程· 2025-07-21 03:09
Core Viewpoint - The article emphasizes the importance of integrating existing technical skills with large model applications to enhance career prospects in the AI era, rather than abandoning current expertise [2][3]. Summary by Sections Current Industry Trends - Many professionals in programming fields are feeling anxious about the rise of large models like GPT and DeepSeek, prompting a need to adapt and learn new skills [2]. - Despite layoffs and salary reductions, the trend towards AI application implementation is expected to continue, presenting opportunities for career advancement and salary increases [3]. Course Offerings - A course titled "Large Model Application Development Practical Training" is introduced, designed to help developers master the complete AI application development process through practical projects and live instruction [3][4]. - The course covers essential technologies such as RAG, AI Agent, and Transformer architecture, structured in five modules from basic to advanced levels [7]. Learning Outcomes - Participants will learn to fine-tune mainstream large models for specific scenarios, utilize domain data for model customization, and understand RAG technology for efficient knowledge retrieval and generation [9]. - The course aims to build skills for developing AI Agents capable of multi-task collaboration and complex problem-solving in various industry applications [9]. Success Metrics - The course has served over 20,000 students, receiving positive feedback for its learning methods and outcomes, with many participants securing high-paying job offers [11]. - The program offers opportunities for networking with product teams, building technical barriers, and avoiding job insecurity, particularly for those approaching career milestones [13]. Additional Benefits - Participants will receive access to real-world case studies and insights into high-demand AI applications, enhancing their practical experience and employability [14][16]. - The course includes direct referral opportunities to companies, increasing the chances of obtaining high-paying positions in the AI field [18].
AI三问③模型之问 | 直面模型之问,以大爱共塑 AI 未来 ——WAIC 2025 大模型论坛以问题破局引领技术革新
3 6 Ke· 2025-07-17 03:21
Core Insights - The 2025 World Artificial Intelligence Conference (WAIC) will take place from July 26 to 28 in Shanghai, focusing on three critical questions in AI: the mathematical question, the scientific question, and the model question, which aim to explore the essence of AI technology and its applications [3][4][5] Group 1: Event Overview - WAIC is a significant global event in the AI sector, promoting technological breakthroughs, industry integration, and deep dialogues on global governance [3] - The event will feature a forum titled "Boundless Love, Shaping the Future," hosted by SenseTime, focusing on the "model question" and its implications for AI technology [3][4] Group 2: Model Question Focus - The "model question" series aims to create a global platform for top researchers and technical experts to discuss the intrinsic issues of AI models, particularly the relationship between model generalization and underlying architecture [4] - The event will explore the integration of Transformer and non-Transformer architectures, addressing challenges such as semantic mismatches in multi-modal intelligence and optimizing performance-cost curves [5] Group 3: Global Collaboration and Innovation - The conference will gather leaders from academia and industry to discuss the future trends and development paths of large model technologies, focusing on obstacles to achieving higher-level intelligence [6] - Experts will engage in discussions on innovative solutions for model architecture and computational optimization, aiming to bridge the gap in multi-modal semantics and performance boundaries [6]
特斯拉、英伟达机器人背后的“卖水人”
虎嗅APP· 2025-07-06 03:31
Core Viewpoint - The article discusses the rise of embodied intelligence and the critical role of data providers, like CyberOrigin, in the robotics industry, emphasizing that data is the new oil for the development of humanoid robots [3][5][23]. Group 1: Industry Trends - The emergence of embodied AI has led to significant interest from major companies like Tesla and NVIDIA, which are now focusing on humanoid robot development [11][20]. - The Transformer architecture has revolutionized the robotics field by enabling better spatial understanding and generalization capabilities, allowing robots to learn from vast amounts of data [12][13][14]. Group 2: Company Insights - CyberOrigin, founded by Yin Peng, aims to become a leading data supplier for humanoid robots, focusing on real-world interaction data rather than just hardware [5][22]. - The company has established partnerships with major AI firms and is actively collecting millions of hours of real-world data to enhance robot training [25][26][29]. Group 3: Data Importance - Data is essential for the evolution of both the physical robot and its cognitive capabilities, with the analogy that models are engines while data is the fuel [23][24]. - The company prioritizes collecting real-world data over synthetic data, believing that authentic data significantly improves model training outcomes [26][27]. Group 4: Challenges and Opportunities - The robotics industry is currently in a chaotic phase, with many new entrants recognizing the value of data, leading to increased competition [51]. - The company acknowledges the long commercial chain in the robotics sector but believes that data can quickly form a commercial loop, making it a strategic focus [22][23].
华尔街嗅到量子投资机遇 热门“量子计算概念股”Rigetti Computing喜获“增持”
Zhi Tong Cai Jing· 2025-07-02 14:20
Core Insights - Rigetti Computing has gained significant attention in the U.S. stock market due to Cantor Fitzgerald initiating coverage with a "buy" rating and a target price of $15, indicating Wall Street's growing interest in quantum computing as a lucrative investment opportunity [1][2] - The quantum computing sector is still in its infancy but is recognized as a highly sought-after technological milestone with potential for substantial economic impact in the future [1][3] - Major tech companies like NVIDIA, Microsoft, and IBM are heavily investing in quantum computing, signaling a competitive landscape and the potential for significant advancements in commercial applications [1][4][8] Company Developments - Rigetti recently completed a $350 million stock issuance to strengthen its balance sheet [2] - NVIDIA's CEO Huang Renxun highlighted that quantum computing is approaching a critical technological turning point, with the potential to solve significant global issues in the coming years [4][5] - Cisco has announced its entry into the quantum computing field by showcasing a prototype chip for connecting quantum computers, indicating a broadening interest in the sector [6] Industry Trends - The concept of a "Transformer moment" in quantum computing is emerging, which refers to the development of controllable and commercially valuable quantum computing applications [7][8] - Recent advancements in technologies such as ion traps and quantum annealing are paving the way for practical quantum computing applications, moving from theoretical concepts to real-world implementations [7][8] - The involvement of major tech giants and government support is expected to accelerate the commercialization of quantum computing on a global scale [8]