多模态大模型

Search documents
通信行业周报(20250707-20250713):博通管理层会议指引积极,Grok4正式发布,建议关注海外算力链景气度机遇-20250713
Huachuang Securities· 2025-07-13 08:33
Investment Rating - The report maintains a "Recommendation" rating for the communication industry, expecting the industry index to outperform the benchmark index by more than 5% in the next 3-6 months [27]. Core Insights - The report highlights that Broadcom's management has indicated a significant surge in AI inference demand, which is expected to drive a systematic reassessment of profit structures and industry capacity [13][15]. - The AI inference demand has exceeded current capacity, suggesting potential upward revisions in profitability forecasts for Broadcom, with a projected market space of $60-90 billion for three major AI clients by 2027 [13][15]. - The report emphasizes the importance of high bandwidth and low latency networks for AI inference workloads, with a current spending ratio of approximately 3:1 between computing and networking devices in AI systems [14]. Market Performance - The communication sector (Shenwan) rose by 2.13% in the week, outperforming the CSI 300 index by 1.31 percentage points and underperforming the ChiNext index by 0.23 percentage points [6][7]. - Year-to-date, the communication sector has increased by 9.54%, surpassing the CSI 300 index by 7.51 percentage points and the ChiNext index by 6.48 percentage points [6][7]. Stock Performance - The top five gainers in the communication sector for the week were: SanChuan Wisdom (+27.91%), GuoYuan Technology (+27.72%), BoChuang Technology (+19.84%), ShiJia Photon (+14.85%), and ZhongXin SaiKe (+12.46%) [10][12]. - The top five losers were: YouFang Technology (-18.52%), RuiSiKangDa (-13.48%), XuanJi Information (-7.31%), HeZhong ShiZhuang (-5.09%), and RunZe Technology (-4.10%) [10][12]. Investment Recommendations - Key recommendations include focusing on major operators such as China Mobile, China Telecom, and China Unicom [21]. - For optical modules and devices, the report recommends companies like XinYiSheng, TianFu Communication, and ZhongJi XuChuang, while suggesting to pay attention to YuanJie Technology and ShiJia Photon [21]. - In the military/satellite communication sector, recommended companies include HaiGe Communication, Shanghai HanXun, and QiYiEr [21].
头部互联网具身实验室招募:多模态大模型、机器人多模态交互、强化学习等算法岗位
具身智能之心· 2025-07-13 05:03
Core Viewpoint - The company is recruiting for various positions related to embodied intelligence, focusing on multimodal large models, robotic multimodal interaction, and reinforcement learning, indicating a strong emphasis on innovation and application in the robotics field [1][3][5]. Group 1: Job Descriptions - **Embodied Multimodal Large Model Researcher**: Responsible for developing core algorithms for embodied intelligence, including multimodal perception, reinforcement learning optimization, and world model construction [1]. - **Robotic Multimodal Interaction Algorithm Researcher**: Focuses on researching multimodal agents, reasoning planning, and audio-visual dialogue models to innovate and apply robotic interaction technologies [3]. - **Reinforcement Learning Researcher**: Engages in exploring multimodal large models and their applications in embodied intelligence, contributing to the development of next-generation intelligent robots [5]. Group 2: Job Requirements - **Embodied Multimodal Large Model Researcher**: Requires a PhD or equivalent experience in relevant fields, with strong familiarity in robotics, reinforcement learning, and multimodal fusion [2]. - **Robotic Multimodal Interaction Algorithm Researcher**: Candidates should have a master's degree or higher, excellent coding skills, and a solid foundation in algorithms and data structures [4]. - **Reinforcement Learning Researcher**: Candidates should have a background in computer science or related fields, with a strong foundation in machine learning and reinforcement learning [6]. Group 3: Additional Qualifications - Candidates with strong hands-on coding abilities and awards in competitive programming (e.g., ACM, ICPC) are preferred [9]. - A keen interest in robotics and participation in robotics competitions are considered advantageous [9].
具身智能之心多模态大模型交流群成立啦!
具身智能之心· 2025-07-12 13:59
扫码加入微信交流群,未经允许,群内不能发广告,否则全平台拉黑处理。如果群已满,欢迎添加小助理 微信CLmovingup邀请入群,备注"具身大模型+入群"! 如果您是多模态大模型相关方向(V+L、V+L+触觉等),正在从事具身相关模型的微调、部署、量化、轻 量化等工作,欢迎加入我们一起交流! 具身智能之心多模态大模型技术交流群来啦,欢迎相关方向的同学加入交流! ...
VLM岗位面试,被摁在地上摩擦。。。
自动驾驶之心· 2025-07-12 12:00
Core Viewpoint - The article discusses the advancements and applications of large models in autonomous driving, particularly focusing on the integration of multi-modal large models in the industry and their potential for future development [2][4][17]. Group 1: Interview Insights - The interview process for a position at Li Auto involved extensive discussions on large models, including their foundational concepts and practical applications in autonomous driving [2][4]. - The interviewer emphasized the importance of private dataset construction and data collection methods, highlighting that data remains the core of business models [4][6]. Group 2: Course Overview - A course on multi-modal large models is introduced, covering topics from general multi-modal models to fine-tuning techniques, ultimately focusing on end-to-end autonomous driving applications [5][9][11]. - The course structure includes chapters on the introduction to multi-modal large models, foundational modules, general models, fine-tuning techniques, and specific applications in autonomous driving [9][11][17]. Group 3: Technical Focus - The article outlines the technical aspects of multi-modal large models, including architecture, training paradigms, and the significance of fine-tuning techniques such as Adapter and LoRA [11][15]. - It highlights the application of these models in autonomous driving, referencing algorithms like DriveVLM, which is pivotal for Li Auto's end-to-end driving solutions [17][19]. Group 4: Career Development - The course also addresses career opportunities in the field, discussing potential employers, job directions, and the skills required for success in the industry [19][26]. - It emphasizes the importance of having a solid foundation in deep learning and model deployment, along with practical coding skills [27].
之心急聘!25年业务合伙人招聘,量大管饱~
自动驾驶之心· 2025-07-12 05:41
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, research guidance, and hardware development [2][5] - The main areas of expertise sought include large models, multimodal models, diffusion models, SLAM, 3D object detection, and closed-loop simulation [3] - Candidates from QS200 universities with a master's degree or higher are preferred, especially those with significant conference contributions [4] Group 2 - The benefits for partners include resource sharing for job seeking, PhD recommendations, and overseas study opportunities, along with substantial cash incentives [5] - There are opportunities for collaboration on entrepreneurial projects [5] - Interested parties are encouraged to contact via WeChat for further inquiries [6]
密室逃脱成AI新考场,通关率不足50%,暴露空间推理短板丨清华ICCV25
量子位· 2025-07-12 04:57
Core Insights - The article discusses the rapid development of multimodal large language models (MLLMs) and their capabilities in complex visual reasoning tasks, particularly through a new evaluation platform called EscapeCraft [1][2]. EscapeCraft Environment - EscapeCraft is a 3D escape room environment designed to assess the reasoning abilities of MLLMs by requiring them to explore, find items, and unlock exits through integrating visual, spatial, and logical information [4][5]. - The platform allows for customizable difficulty levels and supports various tasks such as question answering, logical reasoning, and narrative reconstruction [6][5]. Model Performance Evaluation - The evaluation focuses on the entire task completion process rather than just the final outcome, assessing whether models can explore autonomously, avoid repeating mistakes, and effectively utilize tools [16]. - Metrics such as Intent-Outcome Consistency and various interaction ratios are introduced to measure the quality of model interactions and reasoning efficiency [17]. Model Comparison Results - The study compares several models, including GPT-4o, Gemini-1.5 Pro, and Claude 3.5, revealing that while GPT-4o has the highest escape success rate, it still makes frequent errors as task complexity increases [21][20]. - The results indicate that models often struggle with spatial awareness and decision-making, leading to unique failure patterns, such as misjudging interactive objects or failing to act on visible clues [22][18]. Conclusion - EscapeCraft serves as a versatile evaluation platform for future research in intelligent agents, multimodal reasoning, and reinforcement learning, providing a foundation for further advancements in the field [5][4].
新股消息 | 传智谱考虑将IPO地点由内地改为香港 或筹集约3亿美元
智通财经网· 2025-07-11 08:31
Group 1 - The company Zhiyun is considering changing its IPO plan location from mainland China to Hong Kong, potentially raising around $300 million (approximately HKD 2.34 billion) [1] - Zhiyun has received strategic investments from various state-owned enterprises, including a total investment of CNY 1 billion from Pudong Venture Capital Group and Zhangjiang Group [1] - The company has launched the visual language model GLM-4.1V-Thinking, designed for complex cognitive tasks, and introduced a new ecosystem platform called "Agent Application Space" [1] Group 2 - In March 2023, Zhiyun completed a strategic financing round exceeding CNY 1 billion, with participants including state-owned enterprises from Hangzhou [2] - Zhiyun received a strategic investment of CNY 500 million from Zhuhai Huafa Group, a state-owned enterprise controlled by the Zhuhai municipal government [2] - Chengdu High-tech Zone announced a strategic investment of CNY 300 million in Zhiyun [2]
报名开启|7月27日,世界人工智能大会腾讯论坛邀您共探AI新纪元
腾讯研究院· 2025-07-11 07:20
Core Viewpoint - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries, highlighting its rapid integration and application in daily life, and anticipates further breakthroughs in AI capabilities by 2025 [1][2]. Group 1: AI Development and Trends - In 2024, the integration and explosive application of generative AI will deepen, with new technological paradigms like multimodal large models and embodied intelligence emerging [1]. - The upcoming 2025 World Artificial Intelligence Conference will focus on the theme of "Intelligent Emergence," addressing the deep integration of global AI technology and industry [2]. Group 2: Conference Highlights - The conference will cover three core topics: vertical implementation of large models, innovative breakthroughs in scenarios, and collaborative ecosystem building [2]. - Tencent will showcase its AI application achievements across diverse scenarios, reflecting its commitment to "technology for good" [2]. Group 3: Engagement and Participation - The event is positioned as not only a technological showcase but also a platform for intellectual exchange, inviting participants to witness the exciting developments in the field of AI [3].
科创AIETF(588790)上涨1.78%,近一年日均成交额跑赢同类产品,机构:多模态大模型和应用发展的奇点将至
Xin Lang Cai Jing· 2025-07-11 05:43
Core Viewpoint - The AI sector is experiencing significant growth, as evidenced by the performance of the Sci-Tech Innovation AI ETF and the developments showcased at the Global AI Summit in Geneva [3][4][5]. Group 1: Market Performance - As of July 11, 2025, the Sci-Tech Innovation AI Index rose by 1.93%, with notable increases in constituent stocks such as Star Ring Technology (up 13.26%) and Cambricon (up 5.48%) [3]. - The Sci-Tech AI ETF (588790) increased by 1.78%, reaching a latest price of 0.57 yuan, and has seen a cumulative increase of 2.56% over the past three months, ranking 3rd among comparable funds [3]. - The latest scale of the Sci-Tech AI ETF reached 44.48 billion yuan, marking a new high since its inception and ranking 1st among comparable funds [4]. Group 2: Fund Flow and Investment Trends - The Sci-Tech AI ETF recorded a net inflow of 50.54 million yuan, with four out of the last five trading days showing net inflows totaling 118 million yuan [4]. - The latest financing buy-in amount for the Sci-Tech AI ETF was 13.30 million yuan, with a financing balance of 252 million yuan, indicating continued interest from leveraged funds [4]. Group 3: Historical Performance and Fees - The Sci-Tech AI ETF has seen a net value increase of 10.72% over the past six months, with a maximum single-month return of 15.59% since inception [5]. - The management fee for the Sci-Tech AI ETF is 0.50%, and the custody fee is 0.10%, which are relatively low compared to comparable funds [5]. - The tracking error for the Sci-Tech AI ETF over the past six months is 0.030%, the highest tracking precision among comparable funds [5]. Group 4: Index Composition - The Sci-Tech Innovation AI Index consists of 30 large-cap companies that provide foundational resources, technology, and application support for the AI sector [6]. - As of June 30, 2025, the top ten weighted stocks in the index accounted for 68.03% of the total index weight, with companies like Cambricon and Lanke Technology leading the list [7].
ICML 2025 Spotlight | 快手、南开联合提出模块化双工注意力机制,显著提升多模态大模型情感理解能力!
AI前线· 2025-07-11 05:20
Core Insights - The article emphasizes that "emotional intelligence" is a crucial development direction for the next generation of artificial intelligence, marking a significant step towards general artificial intelligence. It highlights the need for digital humans and robots to accurately interpret multimodal interaction information and deeply explore human emotional states for more realistic and natural human-machine dialogue [1]. Group 1: Technological Advancements - The Kuaishou team and Nankai University have made groundbreaking research in the field of "multimodal emotion understanding," identifying key shortcomings in existing multimodal large models regarding emotional cue capture [1]. - A new modular duplex attention paradigm has been proposed, leading to the development of a multimodal model named 'MODA,' which significantly enhances capabilities in perception, cognition, and emotion across various tasks [1][7]. - The 'MODA' model has shown remarkable performance improvements in 21 benchmark tests across six major task categories, including general dialogue, knowledge Q&A, table processing, visual perception, cognitive analysis, and emotional understanding [1][28]. Group 2: Attention Mechanism Challenges - Existing multimodal large models exhibit a modal bias due to a language-centric pre-training mechanism, which hampers their ability to focus on fine-grained emotional cues, resulting in poor performance in advanced tasks requiring detailed cognitive and emotional understanding [4][7]. - The study reveals that attention scores in multimodal models tend to favor text modalities, leading to significant discrepancies in attention distribution across different layers, with cross-modal attention differences reaching up to 63% [4][8]. Group 3: Performance Metrics - The introduction of the modular duplex attention paradigm has effectively mitigated attention misalignment issues, reducing cross-modal attention differences from 56% and 62% to 50% and 41% respectively [25]. - The 'MODA' model, with parameter sizes of 8 billion and 34 billion, has achieved significant performance enhancements across various tasks, demonstrating its effectiveness in content perception, role cognition, and emotional understanding [25][28]. Group 4: Practical Applications - 'MODA' has shown strong potential in human-machine dialogue scenarios, capable of real-time analysis of user micro-expressions, tone, and cultural background, thereby constructing multidimensional character profiles and understanding emotional contexts [31]. - The model has been successfully applied in Kuaishou's data perception project, significantly enhancing data analysis capabilities, particularly in emotion recognition and reasoning tasks, thereby improving the accuracy of emotional change detection and personalized recommendations [33].