Workflow
多模态大模型
icon
Search documents
阿里巴巴集团副总裁许主洪:多模态大模型是通往AGI的关键路径|直击MWC上海2025
Guo Ji Jin Rong Bao· 2025-06-19 10:48
Core Insights - The era of multimodal agent AI has just begun, with significant technical challenges remaining to achieve Artificial General Intelligence (AGI) [1] - Multimodal large models integrate various inputs and outputs such as text, speech, images, and videos, enhancing processing capabilities and user interaction experiences [3] Multimodal Technology Development - Multimodal technology is essential for achieving AGI as it provides richer contextual understanding and improves model performance and accuracy [3] - The technology can be categorized into understanding and generation tasks, with challenges in modality encoding alignment and high-quality content generation [3][4] Technical Evolution - The current multimodal understanding models are primarily based on pre-trained large model technology, with differences mainly in connector design and modality alignment methods [3] - Multimodal understanding models mainly focus on visual and language aspects, with aspirations to handle more modalities in the future [3] Future Directions - Future multimodal large models are expected to unify understanding and generation, although key technologies such as backbone network design and modality alignment still require further research [4] - The industry remains in its early stages, but there is confidence in the application prospects of multimodal technology in fields like search, creation, and robotics [4]
还不知道发什么方向论文?别人已经投稿CCF-A了......
具身智能之心· 2025-06-18 03:03
Group 1 - The core viewpoint of the article is the launch of a mentoring program for students aiming to publish papers in top conferences such as CVPR and ICRA, building on last year's successful outcomes [1] - The mentoring directions include multimodal large models, VLA, robot navigation, robot grasping, embodied generalization, embodied synthetic data, end-to-end embodied intelligence, and 3DGS [2] - The mentors have published papers in top conferences like CVPR, ICCV, ECCV, ICLR, RSS, ICML, and ICRA, indicating their rich guiding experience [3] Group 2 - Students are required to submit a resume and must come from a domestic top 100 university or an international university ranked within QS 200 [4][5]
京东今年向应届生提供1.8万余个岗位
转自:北京日报客户端 记者近日从京东获悉,今年该公司将面向2025届毕业生提供1.8万余个岗位。数据显示,截至4月30日, 京东体系员工总数已超过72万人,其中快递小哥、运输司机、分拣员工等一线员工总数超过50万人。 "非常惊喜!能在实习后通过转正述职,提前锁定正式校招offer(入职通知)。"去年正式入职京东的晓 韦说,公司为大学生人才设置了快速成长通道,他在入职后的短短一年间连获两次晋升,成长为一名能 够独当一面的采销人员。 京东集团雇主品牌负责人石玉介绍,公司在连续三年累计面向在校生提供5万多个岗位的基础上,今年 面向2025届毕业生再提供1.8万余个岗位,核心岗位薪资提升20%。同时,今年5月,京东启动了面向全 球技术人才招聘的"顶尖青年技术天才计划",在新兴领域持续提供更多优质岗位,涵盖多模态大模型与 应用、机器学习、搜索推荐广告、空间与具身智能、高性能与云计算、大数据等前沿领域。 新技术催生新职业,公司近年来增添了许多新岗位,例如"大模型+"广告智能投放岗、"AI+"医疗服务 岗、家用机器人研发岗、无人机飞行师等等。 "有了'五险一金',心里踏实也更有奔头。"今年3月成为京东外卖全职骑手的杨晶泽说 ...
何小鹏:大模型道路,大家都在摸着石头过河
news flash· 2025-06-12 11:31
Core Viewpoint - The CEO of Xiaopeng Motors, He Xiaopeng, emphasized the importance of the new driving assistance chip "Turing" during the launch of the G7 SUV, indicating that the industry is still exploring the path of large models in autonomous driving technology [1] Group 1: Company Insights - Xiaopeng Motors introduced its latest SUV model, the G7, on June 10, highlighting the significance of the "Turing" chip for driving assistance [1] - The majority of the launch event was dedicated to discussing the capabilities and features of the "Turing" chip, showcasing the company's focus on advanced technology [1] Group 2: Industry Trends - The VLA solution is emerging as a preferred choice among leading players in China's driving assistance sector, with competitors like Li Auto also developing this solution [1] - There is a divergence in approaches between domestic companies and Tesla, with Tesla continuing to focus on an "end-to-end" solution rather than engaging with multi-modal large models [1]
格灵深瞳: 国泰海通证券股份有限公司关于北京格灵深瞳信息技术股份有限公司部分募投项目变更实施地点的核查意见
Zheng Quan Zhi Xing· 2025-06-12 10:28
Fundraising Overview - The company raised a total of RMB 182,622.31 million from the public offering of 46,245,205 shares at a price of RMB 39.49 per share, with a net amount of RMB 167,009.02 million after deducting fees [1][4] - The company has an excess raised fund of RMB 67,009.02 million [1] Project Investment Status - The company announced the use of raised funds for the "Multimodal Large Model Technology and Application R&D Project," with a total investment of RMB 100,006.17 million allocated for this project [1][2] Change of Project Implementation Location - The implementation location for the "Multimodal Large Model Technology and Application R&D Project" is being changed from Yanqing District to Daxing District, while still maintaining the original location in Haidian District [1][2] - The new location in Daxing District is strategically positioned with ample office space and proximity to key transportation hubs, enhancing operational efficiency and project management [1][2] Impact of Location Change - The change in location aligns with the company's long-term development strategy and does not affect the project's content or the intended use of raised funds [3][4] - The company will adhere to relevant regulations and strengthen supervision over the use of raised funds to ensure legality and effectiveness [3][4] Review and Approval Process - The change in project location was approved by the company's board and supervisory committee, confirming compliance with regulatory requirements [3][4]
2D图像作中介,零训练实现3D场景生成SOTA:英伟达&康奈尔提出文本驱动新流程
机器之心· 2025-06-12 03:23
本文第一作者顾泽琪是康奈尔大学计算机科学四年级博士生,导师为 Abe Davis 教授和 Noah Snavely 教授,研究方向专注于生成式 AI 与多模态大模型。本项目为 作者在英伟达实习期间完成的工作。 想象一下,你是一位游戏设计师,正在为一个奇幻 RPG 游戏搭建场景。你需要创建一个 "精灵族树屋村落"—— 参天古木和树屋、发光的蘑菇路灯、半透 明的纱幔帐篷... 传统工作流程中,这可能需要数周时间:先手工建模每个 3D 资产,再逐个调整位置和材质,最后反复测试光照效果…… 总之就是一个 字,难。 核心贡献:无需训练的智能 3D 场景工厂 ArtiScene 的核心创新在于构建了一个完全 无需额外训练 的自动化流水线,将文本生成图像的前沿能力与 3D 重建技术巧妙结合。它一共包含五步: 1. 2D 图像作为 "设计蓝图" 系统首先用扩散模型生成等轴测视角的场景图。这种视角常用于建筑设计示意图,因为它能同时呈现物体的长、宽、高信息,且不受场景位置影响。相比直 接生成 3D,这种方法能利用更成熟的 2D 生成技术确保布局合理性和视觉美感。 这种困境正是当前 3D 内容创作领域的缩影。传统 3D 设计软件如 ...
CVPR 2025 | 多模态统一学习新范式来了,数据、模型、代码全部开源
机器之心· 2025-06-12 00:53
Core Viewpoint - The article discusses the development of a unified audio-visual scene understanding model, emphasizing the need for models to possess general understanding capabilities similar to humans, rather than focusing solely on single-task performance [2][13]. Group 1: Unified Learning Paradigm - The current mainstream learning paradigm for multi-modal models overlooks the heterogeneity of multi-modal data and the complex relationships between tasks, leading to potential interference when tasks are jointly trained [2][13]. - A new paradigm for multi-modal large model learning is proposed, focusing on effective task cooperation from both data and model perspectives, surpassing specialized models in various scene understanding tasks [3][14]. Group 2: Data and Model Structure - The AV-UIE dataset is introduced, which includes explicit reasoning processes and specific temporal and spatial information to clarify the mutual assistance relationships between tasks [15][16]. - The proposed model architecture includes interaction-aware LoRA structures that allow for the decoupling of different capabilities, enabling tasks to share and benefit from enhanced abilities [21][23]. Group 3: Experimental Results - The Crab model demonstrates superior general understanding capabilities across multiple tasks compared to other models, as evidenced by comprehensive ablation studies and performance comparisons [26][30]. - The model outperforms specialized models in tasks such as AVE, ARIG, and AVQA, showcasing its effectiveness in enhancing performance through task cooperation [27][28][29].
2025年中国多模态大模型行业硬件现状 AI芯片和AI服务器的需求在多模态大模型影响下加速增长【组图】
Qian Zhan Wang· 2025-06-11 05:17
Core Insights - The AI chip market in China is projected to reach 168.8 billion yuan in 2024, reflecting a year-on-year growth of 40% due to increasing demand and technological advancements [5] - The AI server market is expected to grow significantly, with a projected market size of 11.5 billion USD in 2024 and 13.4 billion USD by 2027, indicating a compound annual growth rate of 22% from 2022 to 2027 [10] AI Chip Overview - AI chips are defined broadly as chips designed for artificial intelligence applications, with various designs and methods emerging to meet diverse demands [5] - The classification of AI chips can be based on technical architecture, functionality, and application scenarios [5] - Major companies in the AI chip sector include Huawei HiSilicon, Cambricon, Horizon Robotics, and others, focusing on applications in smart devices and security [7][8] AI Server Overview - AI servers are designed to support AI applications, consisting of components like DRAM, GPU, and acceleration chips, and can be categorized into deep learning training and intelligent application inference types [3] - The demand for AI servers is increasing due to rapid advancements in digital infrastructure and the rise of multimodal large models, which require enhanced computational capabilities [9] - Innovations in AI server technology are driven by the need for high-performance processors, large memory, and efficient cooling systems [9] Competitive Landscape - The AI chip market is concentrated among a few key players, with significant achievements in chip design and partnerships across various industries [7][8] - Companies like Huawei, Cambricon, and Horizon Robotics are actively collaborating with automotive and technology firms to expand their market presence [8]
海天瑞声20250610
2025-06-10 15:26
海天瑞声 20250610 摘要 Meta 投资 Scale AI 旨在获取高质量数据及拓展国防等市场,以支持其 AI 商业化落地,并看重其客户资源及政商军事领域布局。 Scale AI 营收高速增长,预计 2025 年达 20 亿美元,估值翻倍至 276 亿美元,主要受益于美国军方和政府订单。 海天瑞声认为 AI 应用普及和多模态大模型发展抬升市场空间,视觉数据 需求激增,2025 年 Q1 视觉收入占比达 49%。 海天瑞声 2025 年发力数据积累业务,并拓展海外市场,菲律宾数据交 付基地提供低成本产能,内容审核业务贡献现金流。 海天瑞声通过研发创新、AI 辅助标注和合成数据等方式提升竞争力,并 关注新型数据需求。 国内大模型发展推动海天瑞声与中国移动等央企合作,受益于沿投联动 机制,订单显著增长。 海天瑞声通过"3+1"模式参与地方政府数据产业化项目,提供数据治 理和标注等服务,并采取本地化部署策略确保合规。 Q&A Meta 对 Scale AI 的投资背后的逻辑是什么? Meta 对 Scale AI 的投资主要有两个方面的考虑。首先,数据处理在 AI 训练中 仍然至关重要。Scale AI 拥有 ...
苹果AI放鸽子,AI录音机、AI玩具等“新国货”先火了
Nan Fang Du Shi Bao· 2025-06-10 08:41
Group 1: Industry Trends - The "2025 High-Quality Consumption Brand TOP100" initiative focuses on nine key sectors including beauty economy, sports and outdoor, food and health, smart consumer electronics, pet economy, experience economy, interest consumption, cross-border expansion, and consumption technology [2] - AI and hardware integration is emerging as a significant trend across various sectors, with companies launching AI-enabled products that are breaking traditional market boundaries [2][3] - The global AI hardware market is witnessing rapid growth, with notable products like AI recorders and AI glasses gaining traction [3][5] Group 2: AI Hardware Developments - The AI recorder Plaud Note has achieved significant success, with nearly 700,000 units shipped globally and an annual revenue of $100 million, reflecting a tenfold growth over two years [5][11] - AI glasses are becoming increasingly popular, with companies like Thunderbird and Rokid announcing new products that leverage AI for enhanced user experiences [7][8] - AI technology is enhancing the functionality of household appliances, with smart kitchen devices seeing over 30% sales growth in 2024 [20][21] Group 3: Consumer Insights - A survey indicated that over 30% of consumers are motivated to purchase products that incorporate AI technology, with more than half feeling a sense of upgrade when encountering AI-enabled Chinese brands [22] - The integration of AI in household appliances is shifting the industry from passive response to proactive service, creating interconnected smart home ecosystems [22][23]