多模态大模型

Search documents
广联达(002410) - 002410广联达投资者关系管理信息20250621
2025-06-21 13:35
Group 1: AI Strategy and Advantages - The company has developed a large model, AecGPT, specifically for the construction industry, which was released in 2024 and can pass the national construction examination with high scores [2] - Key elements for successful industrial AI include high-quality data, valuable scenarios, and reliable models [2] - The company possesses a comprehensive engineering construction knowledge base that supports the construction large model [2] Group 2: AI Application Scenarios - The company focuses on three main directions for AI scenario implementation: integrated design, refined cost management, and precise construction management [3][4] - In integrated design, AI is used to enhance design workflows and assist in construction drawing design [3] - Refined cost management leverages AI and data to drive detailed cost management throughout the project lifecycle [4] Group 3: Value Measurement and Commercialization - High-value AI applications should be able to deliver a complete task process, be measurable in value, and continuously learn and optimize [5] - The AI intelligent bidding product has been implemented in 716 construction bidding projects in Hainan, resulting in an average bid reduction rate of 8% and saving approximately CNY 4.56 billion [5] - The commercial value of AI products is closely linked to technological maturity and the ability to meet new demands [6] Group 4: Future AI Opportunities - Future high-value AI scenarios will emerge from both technological breakthroughs and evolving market demands [7] - The upcoming market reforms in September 2025 will drive the need for effective data management and cost control in the construction industry [7] - The company is developing an automatic database construction product that will enhance data collection and analysis efficiency [7]
今夏面世 OpenAI剧透GPT-5
Bei Jing Shang Bao· 2025-06-19 14:52
OpenAI联合创始人兼首席执行官山姆·奥特曼在最新播客中披露,备受关注的GPT-5预计将于今年夏季发布,目前 具体发布日期尚未确定。随着GPT-5发布时间的临近,业界普遍认为,多模态大模型领域又将迎来新一轮的技术 竞争,该模型将成为生成式人工智能能力的一次重大升级。从早期测试者的反馈来看,其性能较GPT-4有显著提 升。但也有人担忧,从去年开始GPT-5就曾屡屡跳票,这会不会又是一次"狼来了"? AI能力重大飞跃 OpenAI开启官方播客,CEO打头阵。当地时间6月18日,OpenAI发布了一则山姆·奥特曼的访谈视频。在40分钟的 专访中,奥特曼回应了大家普遍关心的GPT-5、隐私保护、广告业务、5000亿美元的投资项目"星际之门"等热点 话题。奥特曼说,GPT-5"可能是在今年夏天的某个时候"会发布,但他也同时表示,对于新模型,内部也在讨论 是简单地提升版本号,还是像GPT-4那样不断优化和改进。 奥特曼还暗示,GPT-5所代表的不仅仅是性能升级,它还可能标志着OpenAI朝着统一的、类似代理的模型迈出了 真正的第一步,此举将使其更接近其通用人工智能目标。"我认为我们已经接近这座山的尽头了",他表示。 G ...
阿里巴巴集团副总裁许主洪:多模态大模型是通往AGI的关键路径|直击MWC上海2025
Guo Ji Jin Rong Bao· 2025-06-19 10:48
Core Insights - The era of multimodal agent AI has just begun, with significant technical challenges remaining to achieve Artificial General Intelligence (AGI) [1] - Multimodal large models integrate various inputs and outputs such as text, speech, images, and videos, enhancing processing capabilities and user interaction experiences [3] Multimodal Technology Development - Multimodal technology is essential for achieving AGI as it provides richer contextual understanding and improves model performance and accuracy [3] - The technology can be categorized into understanding and generation tasks, with challenges in modality encoding alignment and high-quality content generation [3][4] Technical Evolution - The current multimodal understanding models are primarily based on pre-trained large model technology, with differences mainly in connector design and modality alignment methods [3] - Multimodal understanding models mainly focus on visual and language aspects, with aspirations to handle more modalities in the future [3] Future Directions - Future multimodal large models are expected to unify understanding and generation, although key technologies such as backbone network design and modality alignment still require further research [4] - The industry remains in its early stages, but there is confidence in the application prospects of multimodal technology in fields like search, creation, and robotics [4]
还不知道发什么方向论文?别人已经投稿CCF-A了......
具身智能之心· 2025-06-18 03:03
Group 1 - The core viewpoint of the article is the launch of a mentoring program for students aiming to publish papers in top conferences such as CVPR and ICRA, building on last year's successful outcomes [1] - The mentoring directions include multimodal large models, VLA, robot navigation, robot grasping, embodied generalization, embodied synthetic data, end-to-end embodied intelligence, and 3DGS [2] - The mentors have published papers in top conferences like CVPR, ICCV, ECCV, ICLR, RSS, ICML, and ICRA, indicating their rich guiding experience [3] Group 2 - Students are required to submit a resume and must come from a domestic top 100 university or an international university ranked within QS 200 [4][5]
京东今年向应届生提供1.8万余个岗位
Bei Jing Ri Bao Ke Hu Duan· 2025-06-13 01:11
转自:北京日报客户端 记者近日从京东获悉,今年该公司将面向2025届毕业生提供1.8万余个岗位。数据显示,截至4月30日, 京东体系员工总数已超过72万人,其中快递小哥、运输司机、分拣员工等一线员工总数超过50万人。 "非常惊喜!能在实习后通过转正述职,提前锁定正式校招offer(入职通知)。"去年正式入职京东的晓 韦说,公司为大学生人才设置了快速成长通道,他在入职后的短短一年间连获两次晋升,成长为一名能 够独当一面的采销人员。 京东集团雇主品牌负责人石玉介绍,公司在连续三年累计面向在校生提供5万多个岗位的基础上,今年 面向2025届毕业生再提供1.8万余个岗位,核心岗位薪资提升20%。同时,今年5月,京东启动了面向全 球技术人才招聘的"顶尖青年技术天才计划",在新兴领域持续提供更多优质岗位,涵盖多模态大模型与 应用、机器学习、搜索推荐广告、空间与具身智能、高性能与云计算、大数据等前沿领域。 新技术催生新职业,公司近年来增添了许多新岗位,例如"大模型+"广告智能投放岗、"AI+"医疗服务 岗、家用机器人研发岗、无人机飞行师等等。 "有了'五险一金',心里踏实也更有奔头。"今年3月成为京东外卖全职骑手的杨晶泽说 ...
何小鹏:大模型道路,大家都在摸着石头过河
news flash· 2025-06-12 11:31
Core Viewpoint - The CEO of Xiaopeng Motors, He Xiaopeng, emphasized the importance of the new driving assistance chip "Turing" during the launch of the G7 SUV, indicating that the industry is still exploring the path of large models in autonomous driving technology [1] Group 1: Company Insights - Xiaopeng Motors introduced its latest SUV model, the G7, on June 10, highlighting the significance of the "Turing" chip for driving assistance [1] - The majority of the launch event was dedicated to discussing the capabilities and features of the "Turing" chip, showcasing the company's focus on advanced technology [1] Group 2: Industry Trends - The VLA solution is emerging as a preferred choice among leading players in China's driving assistance sector, with competitors like Li Auto also developing this solution [1] - There is a divergence in approaches between domestic companies and Tesla, with Tesla continuing to focus on an "end-to-end" solution rather than engaging with multi-modal large models [1]
格灵深瞳: 国泰海通证券股份有限公司关于北京格灵深瞳信息技术股份有限公司部分募投项目变更实施地点的核查意见
Zheng Quan Zhi Xing· 2025-06-12 10:28
Fundraising Overview - The company raised a total of RMB 182,622.31 million from the public offering of 46,245,205 shares at a price of RMB 39.49 per share, with a net amount of RMB 167,009.02 million after deducting fees [1][4] - The company has an excess raised fund of RMB 67,009.02 million [1] Project Investment Status - The company announced the use of raised funds for the "Multimodal Large Model Technology and Application R&D Project," with a total investment of RMB 100,006.17 million allocated for this project [1][2] Change of Project Implementation Location - The implementation location for the "Multimodal Large Model Technology and Application R&D Project" is being changed from Yanqing District to Daxing District, while still maintaining the original location in Haidian District [1][2] - The new location in Daxing District is strategically positioned with ample office space and proximity to key transportation hubs, enhancing operational efficiency and project management [1][2] Impact of Location Change - The change in location aligns with the company's long-term development strategy and does not affect the project's content or the intended use of raised funds [3][4] - The company will adhere to relevant regulations and strengthen supervision over the use of raised funds to ensure legality and effectiveness [3][4] Review and Approval Process - The change in project location was approved by the company's board and supervisory committee, confirming compliance with regulatory requirements [3][4]
2D图像作中介,零训练实现3D场景生成SOTA:英伟达&康奈尔提出文本驱动新流程
机器之心· 2025-06-12 03:23
本文第一作者顾泽琪是康奈尔大学计算机科学四年级博士生,导师为 Abe Davis 教授和 Noah Snavely 教授,研究方向专注于生成式 AI 与多模态大模型。本项目为 作者在英伟达实习期间完成的工作。 想象一下,你是一位游戏设计师,正在为一个奇幻 RPG 游戏搭建场景。你需要创建一个 "精灵族树屋村落"—— 参天古木和树屋、发光的蘑菇路灯、半透 明的纱幔帐篷... 传统工作流程中,这可能需要数周时间:先手工建模每个 3D 资产,再逐个调整位置和材质,最后反复测试光照效果…… 总之就是一个 字,难。 核心贡献:无需训练的智能 3D 场景工厂 ArtiScene 的核心创新在于构建了一个完全 无需额外训练 的自动化流水线,将文本生成图像的前沿能力与 3D 重建技术巧妙结合。它一共包含五步: 1. 2D 图像作为 "设计蓝图" 系统首先用扩散模型生成等轴测视角的场景图。这种视角常用于建筑设计示意图,因为它能同时呈现物体的长、宽、高信息,且不受场景位置影响。相比直 接生成 3D,这种方法能利用更成熟的 2D 生成技术确保布局合理性和视觉美感。 这种困境正是当前 3D 内容创作领域的缩影。传统 3D 设计软件如 ...
CVPR 2025 | 多模态统一学习新范式来了,数据、模型、代码全部开源
机器之心· 2025-06-12 00:53
本文第一作者杜恒辉为中国人民大学二年级硕士生,主要研究方向为多模态大模型视听场景理解与推理,长视频理解等,师从胡迪副教授。作者来自于中国人民 大学,清华大学和北京腾讯 PCG AI 技术中心。 我们人类生活在一个充满视觉和音频信息的世界中,近年来已经有很多工作利用这两个模态的信息来增强模型对视听场景的理解能力,衍生出了多种不同类型的 任务,它们分别要求模型具备不同层面的能力。 过去大量的工作主要聚焦于完成单一任务,相比之下,我们人类对周围复杂的的世界具有一个通用的感知理解能力。因此,如何设计一个像人类一样对视听场景 具有通用理解能力的模型是未来通往 AGI 道路上一个极其重要的问题。 当前主流的学习范式是通过构建大规模的多任务指令微调数据集并在此基础上直接做指令 微调 。然而,这种学习范式对于多任务学习而言是最优的吗? 最近中国人民大学高瓴人工智能学院 GeWu-Lab 实验室,清华大学和北京腾讯 PCG AI 技术中心合作发表的 CVPR 2025 论文指出, 当前这种主流的学习范式忽视 了多模态数据的异质性和任务间的复杂关系,简单地将所有任务联合训练可能会造成任务间的相互干扰。 为了有效实现任务间的显示互 ...
2025年中国多模态大模型行业硬件现状 AI芯片和AI服务器的需求在多模态大模型影响下加速增长【组图】
Qian Zhan Wang· 2025-06-11 05:17
Core Insights - The AI chip market in China is projected to reach 168.8 billion yuan in 2024, reflecting a year-on-year growth of 40% due to increasing demand and technological advancements [5] - The AI server market is expected to grow significantly, with a projected market size of 11.5 billion USD in 2024 and 13.4 billion USD by 2027, indicating a compound annual growth rate of 22% from 2022 to 2027 [10] AI Chip Overview - AI chips are defined broadly as chips designed for artificial intelligence applications, with various designs and methods emerging to meet diverse demands [5] - The classification of AI chips can be based on technical architecture, functionality, and application scenarios [5] - Major companies in the AI chip sector include Huawei HiSilicon, Cambricon, Horizon Robotics, and others, focusing on applications in smart devices and security [7][8] AI Server Overview - AI servers are designed to support AI applications, consisting of components like DRAM, GPU, and acceleration chips, and can be categorized into deep learning training and intelligent application inference types [3] - The demand for AI servers is increasing due to rapid advancements in digital infrastructure and the rise of multimodal large models, which require enhanced computational capabilities [9] - Innovations in AI server technology are driven by the need for high-performance processors, large memory, and efficient cooling systems [9] Competitive Landscape - The AI chip market is concentrated among a few key players, with significant achievements in chip design and partnerships across various industries [7][8] - Companies like Huawei, Cambricon, and Horizon Robotics are actively collaborating with automotive and technology firms to expand their market presence [8]