Transformer架构

Search documents
就业市场跌爆了。。
菜鸟教程· 2025-07-21 03:09
Core Viewpoint - The article emphasizes the importance of integrating existing technical skills with large model applications to enhance career prospects in the AI era, rather than abandoning current expertise [2][3]. Summary by Sections Current Industry Trends - Many professionals in programming fields are feeling anxious about the rise of large models like GPT and DeepSeek, prompting a need to adapt and learn new skills [2]. - Despite layoffs and salary reductions, the trend towards AI application implementation is expected to continue, presenting opportunities for career advancement and salary increases [3]. Course Offerings - A course titled "Large Model Application Development Practical Training" is introduced, designed to help developers master the complete AI application development process through practical projects and live instruction [3][4]. - The course covers essential technologies such as RAG, AI Agent, and Transformer architecture, structured in five modules from basic to advanced levels [7]. Learning Outcomes - Participants will learn to fine-tune mainstream large models for specific scenarios, utilize domain data for model customization, and understand RAG technology for efficient knowledge retrieval and generation [9]. - The course aims to build skills for developing AI Agents capable of multi-task collaboration and complex problem-solving in various industry applications [9]. Success Metrics - The course has served over 20,000 students, receiving positive feedback for its learning methods and outcomes, with many participants securing high-paying job offers [11]. - The program offers opportunities for networking with product teams, building technical barriers, and avoiding job insecurity, particularly for those approaching career milestones [13]. Additional Benefits - Participants will receive access to real-world case studies and insights into high-demand AI applications, enhancing their practical experience and employability [14][16]. - The course includes direct referral opportunities to companies, increasing the chances of obtaining high-paying positions in the AI field [18].
AI三问③模型之问 | 直面模型之问,以大爱共塑 AI 未来 ——WAIC 2025 大模型论坛以问题破局引领技术革新
3 6 Ke· 2025-07-17 03:21
Core Insights - The 2025 World Artificial Intelligence Conference (WAIC) will take place from July 26 to 28 in Shanghai, focusing on three critical questions in AI: the mathematical question, the scientific question, and the model question, which aim to explore the essence of AI technology and its applications [3][4][5] Group 1: Event Overview - WAIC is a significant global event in the AI sector, promoting technological breakthroughs, industry integration, and deep dialogues on global governance [3] - The event will feature a forum titled "Boundless Love, Shaping the Future," hosted by SenseTime, focusing on the "model question" and its implications for AI technology [3][4] Group 2: Model Question Focus - The "model question" series aims to create a global platform for top researchers and technical experts to discuss the intrinsic issues of AI models, particularly the relationship between model generalization and underlying architecture [4] - The event will explore the integration of Transformer and non-Transformer architectures, addressing challenges such as semantic mismatches in multi-modal intelligence and optimizing performance-cost curves [5] Group 3: Global Collaboration and Innovation - The conference will gather leaders from academia and industry to discuss the future trends and development paths of large model technologies, focusing on obstacles to achieving higher-level intelligence [6] - Experts will engage in discussions on innovative solutions for model architecture and computational optimization, aiming to bridge the gap in multi-modal semantics and performance boundaries [6]
特斯拉、英伟达机器人背后的“卖水人”
虎嗅APP· 2025-07-06 03:31
Core Viewpoint - The article discusses the rise of embodied intelligence and the critical role of data providers, like CyberOrigin, in the robotics industry, emphasizing that data is the new oil for the development of humanoid robots [3][5][23]. Group 1: Industry Trends - The emergence of embodied AI has led to significant interest from major companies like Tesla and NVIDIA, which are now focusing on humanoid robot development [11][20]. - The Transformer architecture has revolutionized the robotics field by enabling better spatial understanding and generalization capabilities, allowing robots to learn from vast amounts of data [12][13][14]. Group 2: Company Insights - CyberOrigin, founded by Yin Peng, aims to become a leading data supplier for humanoid robots, focusing on real-world interaction data rather than just hardware [5][22]. - The company has established partnerships with major AI firms and is actively collecting millions of hours of real-world data to enhance robot training [25][26][29]. Group 3: Data Importance - Data is essential for the evolution of both the physical robot and its cognitive capabilities, with the analogy that models are engines while data is the fuel [23][24]. - The company prioritizes collecting real-world data over synthetic data, believing that authentic data significantly improves model training outcomes [26][27]. Group 4: Challenges and Opportunities - The robotics industry is currently in a chaotic phase, with many new entrants recognizing the value of data, leading to increased competition [51]. - The company acknowledges the long commercial chain in the robotics sector but believes that data can quickly form a commercial loop, making it a strategic focus [22][23].
华尔街嗅到量子投资机遇 热门“量子计算概念股”Rigetti Computing喜获“增持”
Zhi Tong Cai Jing· 2025-07-02 14:20
Core Insights - Rigetti Computing has gained significant attention in the U.S. stock market due to Cantor Fitzgerald initiating coverage with a "buy" rating and a target price of $15, indicating Wall Street's growing interest in quantum computing as a lucrative investment opportunity [1][2] - The quantum computing sector is still in its infancy but is recognized as a highly sought-after technological milestone with potential for substantial economic impact in the future [1][3] - Major tech companies like NVIDIA, Microsoft, and IBM are heavily investing in quantum computing, signaling a competitive landscape and the potential for significant advancements in commercial applications [1][4][8] Company Developments - Rigetti recently completed a $350 million stock issuance to strengthen its balance sheet [2] - NVIDIA's CEO Huang Renxun highlighted that quantum computing is approaching a critical technological turning point, with the potential to solve significant global issues in the coming years [4][5] - Cisco has announced its entry into the quantum computing field by showcasing a prototype chip for connecting quantum computers, indicating a broadening interest in the sector [6] Industry Trends - The concept of a "Transformer moment" in quantum computing is emerging, which refers to the development of controllable and commercially valuable quantum computing applications [7][8] - Recent advancements in technologies such as ion traps and quantum annealing are paving the way for practical quantum computing applications, moving from theoretical concepts to real-world implementations [7][8] - The involvement of major tech giants and government support is expected to accelerate the commercialization of quantum computing on a global scale [8]
画到哪,动到哪!字节跳动发布视频生成「神笔马良」ATI,已开源!
机器之心· 2025-07-02 10:40
Core Viewpoint - The article discusses the development of ATI, a new controllable video generation framework by ByteDance, which allows users to create dynamic videos by drawing trajectories on static images, transforming user input into explicit control signals for object and camera movements [2][4]. Group 1: Introduction to ATI - Angtian Wang, a researcher at ByteDance, focuses on video generation and 3D vision, highlighting the advancements in video generation tasks due to diffusion models and transformer architectures [1]. - The current mainstream methods face a significant bottleneck in providing effective and intuitive motion control for users, limiting creative expression and practical application [2]. Group 2: Methodology of ATI - ATI accepts two basic inputs: a static image and a set of user-drawn trajectories, which can be any shape, including lines and curves [6]. - The Gaussian Motion Injector encodes these trajectories into motion vectors in latent space, guiding the video generation process frame by frame [6][14]. - The model uses Gaussian weights to ensure that it can "see" the drawn trajectories and understand their relation to the generated video [8][14]. Group 3: Features and Capabilities - Users can draw trajectories for key actions like running or jumping, with ATI accurately sampling and encoding joint movements to generate natural motion sequences [19]. - ATI can handle up to 8 independent trajectories simultaneously, ensuring that object identities remain distinct during complex interactions [21]. - The system allows for synchronized camera movements, enabling users to create dynamic videos with cinematic techniques like panning and tilting [23][25]. Group 4: Performance and Applications - ATI demonstrates strong cross-domain generalization, supporting various artistic styles such as realistic films, cartoons, and watercolor renderings [28]. - Users can create non-realistic motion effects, such as flying or stretching, providing creative possibilities for sci-fi or fantasy scenes [29]. - The high-precision model based on Wan2.1-I2V-14B can generate videos comparable to real footage, while a lightweight version is available for real-time interactions in resource-constrained environments [30]. Group 5: Open Source and Community - The Wan2.1-I2V-14B model version of ATI has been open-sourced on Hugging Face, facilitating high-quality, controllable video generation for researchers and developers [32]. - Community support is growing, with tools like ComfyUI-WanVideoWrapper available to optimize model performance on consumer-grade GPUs [32].
盘一盘,2017年Transformer之后,LLM领域的重要论文
机器之心· 2025-06-29 04:23
Core Insights - The article discusses Andrej Karpathy's concept of "Software 3.0," where natural language becomes the new programming interface, and AI models execute specific tasks [1][2]. - It emphasizes the transformative impact of this shift on developers, users, and software design paradigms, indicating a new computational framework is being constructed [2]. Development of LLMs - The evolution of Large Language Models (LLMs) has accelerated since the introduction of the Transformer architecture in 2017, leading to significant advancements in the GPT series and multimodal capabilities [3][5]. - Key foundational papers that established today's AI capabilities are reviewed, highlighting the transition from traditional programming to natural language interaction [5][6]. Foundational Theories - The paper "Attention Is All You Need" (2017) introduced the Transformer architecture, which relies solely on self-attention mechanisms, revolutionizing natural language processing and computer vision [10][11]. - "Language Models are Few-Shot Learners" (2020) demonstrated the capabilities of GPT-3, establishing the "large model + large data" scaling law as a pathway to more general artificial intelligence [13][18]. - "Deep Reinforcement Learning from Human Preferences" (2017) laid the groundwork for reinforcement learning from human feedback (RLHF), crucial for aligning AI outputs with human values [15][18]. Milestone Breakthroughs - The "GPT-4 Technical Report" (2023) details a large-scale, multimodal language model that exhibits human-level performance across various benchmarks, emphasizing the importance of AI safety and alignment [26][27]. - The release of LLaMA models (2023) demonstrated that smaller models trained on extensive datasets could outperform larger models, promoting a new approach to model efficiency [27][30]. Emerging Techniques - The "Chain-of-Thought Prompting" technique enhances reasoning in LLMs by guiding them to articulate their thought processes before arriving at conclusions [32][33]. - "Direct Preference Optimization" (2023) simplifies the alignment process of language models by directly utilizing human preference data, making it a widely adopted method in the industry [34][35]. Important Optimizations - The "PagedAttention" mechanism improves memory management for LLMs, significantly enhancing throughput and reducing memory usage during inference [51][52]. - The "Mistral 7B" model showcases how smaller models can achieve high performance through innovative architecture, influencing the development of efficient AI applications [55][56].
你的扫描全能王,作价217亿冲刺港股IPO
量子位· 2025-06-27 10:57
Core Viewpoint - The company, Shanghai Hehe Information Technology, is aiming to become the "first stock of intelligent text recognition" in Hong Kong, following its previous listing on the A-share Sci-Tech Innovation Board. The company has shown significant growth in revenue and user engagement, positioning itself as a leader in the AI sector with a focus on text intelligence technology [2][3][4]. Financial Performance - In 2024, the company reported a revenue of 1.438 billion RMB, a net profit of 400 million RMB, and a gross margin of 84.3% [4][25]. - The revenue growth from 2022 to 2024 was approximately 21% CAGR, with revenues of 989 million RMB, 1.187 billion RMB, and 1.438 billion RMB respectively [25]. - The C-end business accounted for a significant portion of total revenue, with contributions of 82.2%, 84.3%, and 83.8% from 2022 to 2024 [27]. User Engagement - The monthly active users (MAU) for C-end products reached 171 million in 2024, with a paid user ratio of 4.3% [21]. - The company ranks first in China and fifth globally among efficiency AI companies with MAU exceeding 100 million [21][22]. Product Portfolio - The company offers a range of products targeting both C-end and B-end markets, including "Scan All-in-One" and "Business Card All-in-One" for C-end, and "TextIn" and "Qixin Huayan" for B-end [8][12]. - The core technology is based on multi-modal text intelligence, which enhances efficiency in various applications [14][15]. Market Position - The company is positioned as a leading AI firm with a focus on text recognition and processing, competing with major players like OpenAI, Google, Adobe, and Microsoft [5][6][21]. - The global AI product market is projected to grow significantly, with estimates of 46.5 billion USD in 2024 and 228 billion USD by 2029, indicating a robust growth trajectory for the industry [66]. Research and Development - The company has been increasing its R&D investment, with expenditures of 280 million RMB, 323 million RMB, and 390 million RMB from 2022 to 2024, representing about 27% of total revenue [33]. - The workforce consists of 1,053 employees, with 60.6% in R&D roles, highlighting the company's commitment to innovation [35]. Future Plans - The funds raised from the Hong Kong listing will primarily be used for R&D, international expansion, and exploring investment and acquisition opportunities [50].
上海AI Lab主任周伯文:关于人工智能前沿的十个问题
机器人圈· 2025-06-26 10:46
Core Viewpoint - The Shanghai Artificial Intelligence Laboratory aims to become a world-class research institution in the field of artificial intelligence, focusing on strategic, original, and forward-looking scientific research and technological breakthroughs [1]. Group 1: Conference Overview - The inaugural Mingzhu Lake Conference, themed "Multidimensional Breakthroughs and Collaborative Innovation in Artificial Intelligence," will take place from June 12-16, 2025, in Shanghai, attracting nearly 60 young scholars and industry leaders [5][48]. - The conference emphasizes the importance of problem discovery alongside problem-solving, as highlighted by the laboratory's director Zhou Bowen [3][16]. Group 2: Key Questions in AI - Zhou Bowen presented ten critical questions regarding the frontiers of artificial intelligence, including the balance between overall intelligence and unit intelligence, resource allocation in deep reinforcement learning, and the relationship between agents and foundational models [4][19]. - The questions aim to address the challenges and opportunities in AI development over the next 3-5 years, focusing on the systematization, diversification, and advancement of intelligent capabilities [19][20]. Group 3: Importance of Scientific Community - The establishment of the Xinghe Community is intended to foster collaboration and innovation among researchers, emphasizing the need for a platform that encourages the discovery and articulation of significant scientific questions [7][17]. - Historical examples illustrate the impact of scientific communities on innovation, highlighting the necessity of collective efforts in addressing complex scientific challenges [10][12][46]. Group 4: Strategic Scientist Emergence - The emergence of strategic scientists is crucial for addressing major scientific challenges, as evidenced by historical examples where significant scientific advancements were achieved through collaborative efforts [46][47]. - The laboratory aims to cultivate strategic scientists by creating conditions that promote high-intensity input, concentrated task tackling, and dense talent development [47].
致敬钱学森,我国学者开发AI虚拟现实运动系统——灵境,解决青少年肥胖难题,揭示VR运动的减肥及促进大脑认知作用机制
生物世界· 2025-06-24 03:56
Core Viewpoint - Adolescent obesity is a global public health crisis with rising prevalence, leading to increased risks of cardiovascular and metabolic diseases, as well as cognitive impairments [2] Group 1: Research and Development - A research team from Shanghai Jiao Tong University and other institutions developed the world's first VR-based exercise intervention system, REVERIE, aimed at overweight adolescents [4][8] - The REVERIE system utilizes deep reinforcement learning and a Transformer-based virtual coach to provide safe, effective, and empathetic exercise guidance [4][8] Group 2: Study Design and Methodology - The study included a randomized controlled trial with 227 overweight adolescents, comparing outcomes between VR exercise, real-world exercise, and a control group [11] - Participants were assigned to different groups, including VR and real-world sports, with all groups receiving uniform dietary management over an eight-week intervention [11] Group 3: Results and Findings - After eight weeks, the VR exercise group lost an average of 4.28 kg of body fat, while the real-world exercise group lost 5.06 kg, showing comparable results [13] - Both VR and real-world exercise groups showed improvements in liver enzyme levels, LDL cholesterol, physical fitness, mental health, and exercise willingness [13] - VR exercise demonstrated superior cognitive function enhancement compared to real-world exercise, supported by fMRI findings indicating increased neural efficiency and plasticity [14] Group 4: Safety and Implications - The injury rate in the VR exercise group was 7.69%, lower than the 13.48% in the real-world exercise group, with no severe adverse events reported [15] - The REVERIE system is positioned as a promising solution for addressing adolescent obesity and promoting overall health improvements beyond weight loss [16][17]
Transformer 在具身智能“水土不服”,大模型强≠机器人强
3 6 Ke· 2025-06-18 11:55
Core Insights - The year 2025 is anticipated to be the "Year of Embodied Intelligence," driven by significant events and advancements in robotics and AI technologies [1] - There is a growing interest and investment in the field of general robotics, but concerns about sustainability and potential market bubbles persist [1] - Experts are exploring the challenges and advancements in embodied intelligence, focusing on the gap between technological ideals and engineering realities [1] Group 1: Industry Trends - A surge in robotics startups and investments indicates a strong belief in the potential of general robotics [1][2] - The transition from multi-modal large models to embodied intelligence is seen as a natural evolution, requiring substantial data and infrastructure improvements [3][4] - Current AI models face limitations in multi-task scenarios, highlighting the need for better adaptability and learning mechanisms [5][6] Group 2: Technical Challenges - The high energy consumption and training costs of large models pose significant challenges for their application in robotics [4][5] - There is a notable gap between the capabilities of large models and the multi-modal sensory systems of robots, complicating their integration [6][7] - The industry is exploring both modular and end-to-end architectures for embodied intelligence, with a shift towards more unified systems [9][10] Group 3: Research and Development - Research is focused on bridging the gap between human, AI, and robotic intelligence, aiming for better collaboration and understanding [16][18] - The current state of embodied intelligence is limited, with robots primarily executing pre-defined tasks rather than understanding human needs [18][19] - Future developments may involve creating systems that can interpret human intentions directly, bypassing traditional communication methods [20][21] Group 4: Future Outlook - Experts believe that achieving true embodied intelligence will require overcoming significant technical hurdles, particularly in understanding and interacting with the physical world [23][24] - The evolution of AI architectures, particularly beyond the current Transformer models, is essential for the long-term success of embodied intelligence [24][25] - The next five to ten years are expected to be critical for advancements in both hardware and software, potentially leading to widespread adoption of household robots [31][32]