多模态大模型

Search documents
A股指数集体高开:沪指微涨0.05%,稀土永磁、稳定币等板块涨幅居前
Feng Huang Wang Cai Jing· 2025-07-11 01:38
Market Overview - Major indices in China opened higher, with the Shanghai Composite Index up 0.05%, Shenzhen Component Index up 0.06%, and ChiNext Index up 0.02% [1] - The Shanghai Composite Index reached 3,511.37 points, while the Shenzhen Component Index was at 10,637.45 points [2] US Market Performance - US stock indices collectively rose, with the Dow Jones up 0.43% at 44,650.64 points, S&P 500 up 0.27% at 6,280.46 points, and Nasdaq up 0.09% at 20,630.66 points, marking new highs [3] - Chinese concept stocks saw a general increase, with notable gains in companies like ZTO Express (up 9.21%) and Beike (up 6.52%) [3] Industry Insights - Huatai Securities remains optimistic about the upward trend in copper prices, viewing recent price corrections as potential buying opportunities, especially in light of upcoming tariffs on copper [4] - CICC suggests investors focus on performance and valuation recovery opportunities in the electric grid and industrial control sectors in the second half of the year, highlighting sustained investment growth in the electric grid [5] - CITIC Securities indicates that despite high valuations in US stocks, there may be opportunities for investment, particularly in technology and telecommunications sectors, as the market adjusts to tariff impacts [6] Technological Developments - Huatai Securities predicts a significant turning point in the development of multimodal large models and applications, driven by advancements in technology and commercial progress [7] - The firm emphasizes the importance of recognizing the mainstream adoption of native multimodal architectures and the need to focus on global advancements in AI commercialization [8]
全球最强AI模型?马斯克发布Grok 4!重仓国产AI产业链的589520单日吸金3922万元!
Xin Lang Ji Jin· 2025-07-11 01:17
Group 1: AI Model Development - xAI's Grok 4 achieved an accuracy rate of 25.4% in "Humanity's Last Exam," surpassing Google's Gemini 2.5 Pro at 21.6% and OpenAI's o3 at 21% [1] - The emergence of multi-modal large models is expected to create significant investment opportunities in both computational power and applications [1] - The AI sector is likely to see further catalytic events in the second half of the year, including the release of new models and platforms from companies like OpenAI and NVIDIA [1] Group 2: Investment Trends - The AI investment trend is gaining momentum, particularly following NVIDIA's market capitalization reaching 4 trillion [2] - The Huabao ETF, focused on the domestic AI industry chain, saw a net inflow of 39.22 million yuan on July 10, with 8 out of the last 10 trading days showing net inflows totaling 50.65 million yuan [2] - Analysts emphasize the importance of experiencing the benefits of the AI era and recognizing the long-term investment value in the rapidly evolving AI technology landscape [4] Group 3: Domestic AI Development - Domestic AI model DeepSeek has made significant advancements, breaking through overseas computational barriers and establishing a foundation for local AI companies [5] - The Huabao ETF is strategically positioned in the domestic AI industry chain, benefiting from the acceleration of AI integration in edge computing and software [5]
端到端VLA这薪资,让我心动了。。。
自动驾驶之心· 2025-07-10 12:40
Core Viewpoint - End-to-End Autonomous Driving (E2E) is the core algorithm for intelligent driving mass production, marking a new phase in the industry with significant advancements and competition following the recognition of UniAD at CVPR [2] Group 1: E2E Autonomous Driving Overview - E2E can be categorized into single-stage and two-stage approaches, directly modeling from sensor data to vehicle control information, thus avoiding error accumulation seen in modular methods [2] - The emergence of BEV perception has bridged gaps between modular methods, leading to a significant technological leap [2] - The rapid development of E2E has led to a surge in demand for VLM/VLA expertise, with potential salaries reaching millions annually [2] Group 2: Learning Challenges - The fast-paced evolution of E2E technology has made previous learning materials outdated, necessitating a comprehensive understanding of multi-modal large models, BEV perception, reinforcement learning, and more [3] - Beginners face challenges in synthesizing knowledge from numerous fragmented papers and transitioning from theory to practice due to a lack of high-quality documentation [3] Group 3: Course Development - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address learning challenges, focusing on Just-in-Time Learning to help students quickly grasp core technologies [4] - The course aims to build a framework for research capabilities, enabling students to categorize papers and extract innovative points [5] - Practical applications are integrated into the course to ensure a complete learning loop from theory to practice [6] Group 4: Course Structure - The course consists of multiple chapters covering the history and evolution of E2E algorithms, background knowledge, two-stage and one-stage E2E methods, and the latest advancements in VLA [8][9][10] - Key topics include the introduction of E2E algorithms, background knowledge on VLA, and practical applications of diffusion models and reinforcement learning [11][12] Group 5: Target Audience and Outcomes - The course is designed for individuals with a foundational understanding of autonomous driving and aims to elevate participants to a level comparable to one year of experience as an E2E algorithm engineer [19] - Participants will gain a deep understanding of key technologies such as BEV perception, multi-modal large models, and reinforcement learning, enabling them to apply learned concepts to real-world projects [19]
商汤科技李星冶:多模态大模型“所见即所得”让人机交互更顺畅
Bei Ke Cai Jing· 2025-07-10 11:49
Core Insights - The article discusses the evolution of artificial intelligence from 1.0 to 2.0, highlighting SenseTime's breakthroughs in multimodal interaction technology and its applications across various sectors [1][2]. Group 1: AI Evolution - SenseTime has transitioned from focusing on computer vision in the AI 1.0 era to promoting multimodal interaction innovations in the AI 2.0 era, driven by the rise of large model technologies in 2023 [1]. - The concept of "seeing is believing" is emphasized, integrating video, images, and voice to enable real-time interaction with humans [1]. Group 2: Applications in Education - In the education sector, SenseTime collaborates with learning device manufacturers to develop interactive devices that utilize real-time algorithms to assist children in solving problems and recognizing errors [2]. - The system supports interactive storytelling for young children by converting images into narratives, and SenseTime has partnered with around 10 schools to create smart campus assistants for managing course schedules and grade inquiries [2]. Group 3: Intelligent Applications - SenseTime's intelligent applications include algorithms that analyze industry data to assist in warehouse leasing scenarios and generate lease management solutions [2]. - In customer service, SenseTime collaborates with well-known operators to create efficient intelligent agents, and in smart home applications, it enhances family interaction through AI technology [2]. - The advantage of multimodal large models lies in enabling smoother interactions beyond text command recognition, utilizing visual and multidimensional information [2].
有几个Top具身公司的大模型、强化学习、VLA和具身导航岗位!
具身智能之心· 2025-07-10 03:36
最近和几家公司对接了下,有一些大模型、强化学习、导航相关的职位需求,这里也和大家分享 下。职位比较靠谱,是具身领域的独角兽公司,资金充裕,感兴趣的同学可以底部扫码了解更多。 1)多模态大模型 base:北京、深圳 薪资:40k-80k/月 2.具有机器人感知/导航/操作、AI大语言模型/多模态大模型等领域丰富的从业经验; 3.了解具身智能领域前沿的VLM/VLN/VLA多模态模型算法,对于比较有挑战性的实际问题有自己的 判断和分析解决能力; 4.具有NaVid/MobilityVLA等将多模态大模型技术应用于机器人导航领域的算法研发及落地经验者优 先; 5.扎实的前沿算法研发与高效的工程实现能力,具备技术快速落地的能力; 方向:移动操作、导航、VLA等; 职位描述: 1.从事具身智能多模态大模型前沿算法研发,应用于室内外多个场景的移动操作平台。包括但不限于 具身智能大模型的框架设计、模型优化、面向导航和操作等下游任务的训练和部署等; 2.探索并推动大语言模型和多模态大模型在机器人领域的技术和Demo。 职位要求: 1.计算机科学、人工智能、机器人、控制工程等相关专业硕士及以上学历; 6.具有良好的团队合作能力 ...
华泰证券今日早参-20250710
HTSC· 2025-07-10 01:44
Core Insights - The report highlights a potential narrowing of the decline in PPI in the second half of 2025, with June CPI showing a slight improvement to 0.1% year-on-year, compared to a previous value of -0.1% [2] - Global manufacturing PMI has rebounded above the growth line, indicating an overall recovery in manufacturing activity, particularly in developed economies [2] - The report emphasizes the importance of monitoring the performance of various sectors, particularly those expected to benefit from the "anti-involution" policies and improving economic conditions [4] Macroeconomic Overview - June CPI in China improved to 0.1% year-on-year, while PPI decreased by 3.6% year-on-year, indicating a mixed inflationary environment [2] - Global manufacturing PMI showed a notable increase, with developed markets improving while some emerging markets like Vietnam and Indonesia showed marginal declines [2] Sector Analysis Fixed Income - The report discusses the impact of "anti-involution" policies on PPI and CPI, suggesting a potential stabilization in prices, with CPI expected to rise slightly to around 0.5% by Q4 2025 [5] - The report notes that the demand side remains critical for price elasticity, with industry self-discipline and private enterprise willingness being key factors [5] Machinery and Equipment - The report indicates a recovery in excavator sales, with June sales reaching 18,800 units, a year-on-year increase of 13.3%, driven by strong export growth [8] - The growth in second-hand excavator exports is expected to stimulate domestic replacement demand, benefiting leading companies in the sector [8] Agriculture - The report highlights ongoing "anti-involution" efforts in the pig farming industry, which may lead to inventory release and improved profitability for high-quality pig farming companies [9] - The report suggests that the pig farming sector may gradually transition to a phase of high-quality competition, with recommendations for companies like Muyuan Foods and Wens Foodstuffs [9] Renewable Energy and Equipment - The report anticipates strong growth for offshore wind energy, with a significant increase in orders expected to drive performance for leading companies in the sector [19] - The report emphasizes the importance of technological advancements and capacity expansion in the offshore wind sector [19] Electronics and Chemicals - The report forecasts a substantial increase in net profit for Shengquan Group in the first half of 2025, driven by strong demand for electronic materials [20] - The report maintains a positive outlook on the company's growth trajectory, supported by favorable market conditions [20] Company-Specific Insights - Zhaojin Mining is rated as a "buy" with a target price of 23.44 HKD, driven by expected production growth and favorable gold price trends [15] - Harbin Electric is also rated as a "buy," with anticipated recovery in equipment demand across various energy sectors [15] - MGM China is highlighted for its strong performance in the non-gaming segment, benefiting from increased tourist traffic and successful entertainment events [17]
模式识别与人工智能前沿探讨专题论坛召开
Huan Qiu Wang Zi Xun· 2025-07-09 08:43
Group 1 - The forum focused on national strategic needs and technological frontiers in the fields of pattern recognition and artificial intelligence, gathering nearly 20 experts and representatives from renowned universities, research institutes, and leading enterprises in China [1][3] - The event aimed to foster the cultivation of new productive forces and interdisciplinary integration, injecting new momentum into scientific research innovation and the collaborative development of academic journals [1] Group 2 - Various professors presented specialized reports, including topics such as "3D/4D content creation for arbitrary sparse data," "embodied intelligent robots with emotional intelligence," and "visual perception in unmanned systems" [5][7][11] - A roundtable discussion was held, focusing on new trends and challenges in multimodal large models and generative artificial intelligence, addressing the transformation of research paradigms and talent cultivation in the era of large models [15]
多模态模型学会“按需搜索”,少搜30%还更准!字节&NTU新研究优化多模态模型搜索策略
量子位· 2025-07-08 07:30
MMSearch-R1团队 投稿 量子位 | 公众号 QbitAI 多模态模型学会"按需搜索"! 字节&NTU最新研究, 优化 多模态模型搜索策 略 —— 通过搭建网络搜索工具、构建多模态搜索数据集以及涉及简单有效的奖励机制,首次尝试 基于端到端强化学习的多模态模型自主搜索训练 。 经过训练的模型能够自主判断搜索时机、搜索内容并处理搜索结果,在真实互联网环境中执行多轮按需搜索。 实验结果表明,在知识密集型视觉问答任务 (Visual Question Answering, VQA) 中,MMSearch-R1系统展现出显著优势: 其性能不仅超越同规模模型在传统检索增强生成 (RAG) 工作流下的性能,更 在减少约30%搜索次数的前提 下 , 达 到了更大规模规模模 型做传统RAG的性能水平。 下文将详细解析该研究的研究方法以及实验发现。 具体怎么做到的? 近年来,随着视觉-语言训练数据集在规模和质量上的双重提升,多模态大模型 (Large Multimodal Models, LMMs) 在跨模态理解任务中 展现出卓越的性能,其文本与视觉知识的对齐能力显著增强。 然而,现实世界的信息具有高度动态性和复杂性,单 ...
Z Tech|全球领先的多模态大模型VAST顶薪招募,定义未来十年的技术范式
Z Potentials· 2025-07-08 02:50
Group 1 - The company is currently recruiting a new batch of interns to enhance its workforce and bring in fresh talent [2] - The company is seeking creative individuals from the post-00s generation to drive entrepreneurial initiatives [4] - Z Potentials is a focus area for the company, indicating a strategic interest in developing new opportunities and innovations [5]
复杂空间指令也能秒懂?RoboRefer 让机器人理解推理空间,开放世界也能精准行动!
机器之心· 2025-07-06 06:06
Core Viewpoint - The article discusses the development and capabilities of RoboRefer, a multimodal large model designed for spatial referring tasks in robotics, emphasizing its advanced spatial understanding and reasoning abilities. Group 1: RoboRefer Model Overview - RoboRefer is a multimodal large model that possesses three-dimensional spatial understanding and reasoning capabilities, featuring independent image and depth encoders [12] - The model can accurately answer various spatial perception questions and perform complex combinatorial reasoning based on multiple spatial relationships [12][13] Group 2: Training Techniques - RoboRefer employs full parameter tuning (SFT) to enhance spatial perception and reinforcement learning fine-tuning (RFT) to improve generalization reasoning capabilities [15][16] - The model's training includes a process-based reward function that enhances the quality of intermediate reasoning processes, leading to improved multi-step reasoning abilities [17] Group 3: Performance Metrics - After SFT training, RoboRefer achieved an average success rate of 89.6% in spatial understanding tasks, setting a new advanced level [21] - In the high-difficulty spatial referring task benchmark RefSpatial-Bench, RFT-trained RoboRefer outperformed all other models, surpassing Gemini-2.5-Pro by 17.4% in average accuracy [22] Group 4: Dataset Development - The research team created a large-scale, high-quality dataset called RefSpatial, which includes 2.5 million samples and 20 million question-answer pairs, significantly larger than similar datasets [20] - RefSpatial features detailed multi-step reasoning processes and covers a wide range of everyday interaction scenarios, integrating 31 types of spatial relationships [20] Group 5: Real-World Application - RoboRefer can be flexibly integrated into various types of robots, such as UR5 robotic arms and G1 humanoid robots, enabling precise execution of complex, dynamic, multi-step tasks in real-world environments [9]