多模态大模型

Search documents
AI教父Hinton对话上海AI Lab周伯文:多模态聊天机器人已经具有意识,让AI聪明和让AI善良是两件事
量子位· 2025-07-26 15:56
Core Viewpoint - Geoffrey Hinton, known as the "father of artificial intelligence," visited Shanghai, China, for discussions on AI advancements, emphasizing the intersection of AI and scientific discovery [1][2][3] Group 1: Hinton's Visit and Discussions - Hinton's visit included a public dialogue with Zhou Bowen, director of the Shanghai Artificial Intelligence Laboratory, focusing on cutting-edge AI research [2][3] - The dialogue covered topics such as multimodal large models, subjective experience, and training "kind" superintelligence [3][9] - Hinton's presence was met with enthusiasm, as attendees applauded and recorded the event, highlighting his significance in the AI field [2] Group 2: AI and Scientific Discovery - Zhou Bowen presented the "SAGE" framework, which integrates foundational models, fusion layers, and evaluation layers to elevate AI from a tool to an engine for scientific discovery [3] - Hinton noted that AI has the potential to significantly advance scientific research, citing examples like protein folding and weather prediction, where AI outperforms traditional methods [16][17] Group 3: Perspectives on AI Consciousness - Hinton expressed the view that current multimodal chatbots possess a form of consciousness, challenging conventional beliefs about AI capabilities [9][13] - He discussed the importance of understanding subjective experience in AI, suggesting that many misconceptions exist regarding how these concepts operate [12] Group 4: Training AI for Kindness - Hinton proposed that training AI to be both intelligent and kind involves different methodologies, allowing countries to share techniques for fostering AI kindness without compromising intelligence [14][15] - He emphasized the need for ongoing research to develop universal methods for instilling kindness in AI systems as they become more intelligent [15][16] Group 5: Advice for Young Researchers - Hinton advised young researchers to explore areas where they believe "everyone is wrong," encouraging persistence in their unique approaches until they understand the reasoning behind established methods [18]
可灵AI多图参考生视频模型升级:效果“提升102%”;小鹏机器人新成立智能拟态部,主攻机器人多模态丨AIGC日报
创业邦· 2025-07-26 01:02
Group 1 - Xiaopeng Robotics has established a new Intelligent Mimetic Department focused on multi-modal robotics, with research directions including embodied intelligence, native multi-modal large models, world models, and spatial intelligence [1] - Keling AI has upgraded its multi-image reference video model, achieving a 102% improvement in performance, particularly in character, subject, and scene consistency, dynamic quality, and maintaining artistic style [2] - Zhipu's upcoming GLM-4.5 series AI models are expected to adopt a new mixture of experts (MoE) architecture, with two models anticipated: GLM-4.5 (355B-A32B) and GLM-4.5-Air (106B-A12B) [3] - Alibaba has released the open-source Qianwen 3 inference model, which matches the performance of top closed-source models Gemini-2.5 pro and o4-mini, marking a significant achievement in the open-source domain [4]
员工因反对穿超短裙发奖品被辞退?猿辅导:因工作不达标;农夫山泉股价大涨近6%;宇树最新款人形机器人,3.99万元起丨邦早报
创业邦· 2025-07-26 01:02
Group 1 - The core viewpoint of the article discusses the results of a driving assistance test conducted by Dongche Di, which has sparked controversy among various car manufacturers, particularly regarding the performance of Tesla vehicles [2][3] - The test involved nearly 40 models from over 20 brands, simulating 15 types of high-risk accident scenarios in urban and highway settings [2] - Tesla's Model 3 and Model X achieved a 100% pass rate, making them the only models to pass all tests, which has led to responses from other car manufacturers highlighting common technical challenges in the industry [2] Group 2 - Nongfu Spring's stock price surged nearly 6%, reaching a peak of 47.4 HKD, marking a new high since January 2022, with a market capitalization of 523 billion HKD [6] - Huang Renxun confirmed the existence of a "secret option pool" for rewarding outstanding employees, emphasizing immediate rewards without lengthy approval processes [8] - The company plans to utilize machine learning to review compensation for its 42,000 employees, focusing on employee welfare as a priority [8] Group 3 - BOSS Zhipin responded to a controversy regarding a job seeker's resume being inappropriate, stating that the involved account has been permanently banned from the platform [13] - Xiaopeng Robotics established a new department focused on multi-modal robotics, indicating a strategic shift towards advanced AI applications [13] - Chery clarified its collaboration with JSW Group, stating that it only involves parts supply and does not extend to technology transfer [16] Group 4 - Tesla's Optimus robot production is significantly behind schedule, with only a few hundred units produced this year, far from the 5,000-unit target set by CEO Elon Musk [24] - Google CEO Sundar Pichai's personal wealth has surpassed 1 billion USD, marking a rare achievement for a non-founder CEO [24] - Shentong Express announced plans to acquire Daniao Logistics for 362 million CNY, which will become a wholly-owned subsidiary post-transaction [25] Group 5 - Sony plans to acquire 2.5% of Bandai Namco's shares to jointly develop and promote anime IPs [25] - NewPrinces is set to acquire Carrefour's Italian business for nearly 1 billion EUR, aiming to become the second-largest food and beverage group in Italy [25] - AI startup Anthropic is negotiating to raise its valuation to over 150 billion USD in a new funding round, significantly increasing from its current valuation of 61.5 billion USD [25] Group 6 - OSL Group completed a 300 million USD equity financing, marking the largest public equity financing in Asia's digital asset sector [25] - Shanghai Guotou will participate in a new funding round for the AI startup Jiyue Xingchen, with expected funding exceeding 500 million USD [25] - Yuzhi Tongxing completed a multi-million angel round financing, focusing on AI technology integration [26] Group 7 - Unitree Technology launched its third humanoid robot, UnitreeR1, priced from 39,900 CNY, featuring multi-modal capabilities [26] - Neuralink is collaborating on clinical trials for smart bionic eyes, aiming to assist the visually impaired [28] - Volvo's 2026 S60 model was launched with upgraded features, including a 360-degree panoramic camera and adaptive cruise control, priced from 306,900 CNY [28]
商汤科技完成配售25亿港元 加速布局具身智能
Jing Ji Guan Cha Wang· 2025-07-24 10:35
Core Viewpoint - SenseTime successfully completed the placement of 1.667 billion new Class B shares, raising approximately HKD 2.5 billion, with funds primarily allocated for AI core business development and strategic layout in cutting-edge fields like embodied intelligence and real-world assets [1][2]. Group 1: Fundraising Details - The placement of 1.667 billion shares represents 4.58% of the company's issued Class B shares and 4.50% of the total issued shares, with a subscription price of HKD 1.50 per share, reflecting a discount of approximately 6.25% from the closing price on July 23 [2]. - The entire placement was fully subscribed by Infini Capital, which focuses on global capital allocation needs for Middle Eastern sovereign wealth funds and family offices [2]. Group 2: Allocation of Funds - 30% of the net proceeds will be used for the development of AI core business, including the expansion of the "SenseTime Big Device" infrastructure platform [3]. - Another 30% will support the research and development of generative AI and multimodal large models, aiming to commercialize applications in vertical fields such as smart hardware and digital finance [3]. - 20% will be invested in the integration of embodied intelligence and emerging technologies, while the remaining 20% will be allocated for general operating expenses [3]. Group 3: Strategic Developments - SenseTime plans to establish an independent company focused on embodied intelligence, with a core team including its chief scientist and former JD Research Institute director [4]. - The company has restructured its organizational framework into a "1+X" model, where "1" represents the core business and "X" represents the ecosystem of independent enterprises, including sectors like smart vehicles and home robots [4]. Group 4: Industry Context - The AI industry in China is experiencing significant growth in financing, with leading companies like SenseTime accelerating their technological layouts through capital operations [5]. - The competition in AI technology is evolving from algorithmic levels to hardware and application scenarios, with a shift towards "technology leadership" rather than just "high cost-performance alternatives" [5]. - SenseTime has engaged in deep collaborations with various embodied intelligence companies, developing projects like the "embodied intelligence brain" and emotional support robots [5][6].
出现断层了?ICCV2025的自动驾驶方向演变...
自动驾驶之心· 2025-07-24 09:42
Core Insights - The article highlights the latest advancements in autonomous driving technologies, focusing on various research papers and frameworks that contribute to the field [2][3]. Multimodal Models & VLA - ORION presents a holistic end-to-end framework for autonomous driving, utilizing vision-language instructed action generation [5]. - An all-in-one large multimodal model for autonomous driving is introduced, showcasing its potential applications [6][7]. - MCAM focuses on multimodal causal analysis for ego-vehicle-level driving video understanding [9]. - AdaDrive and VLDrive emphasize self-adaptive systems and lightweight models for efficient language-grounded autonomous driving [10]. Simulation & Reconstruction - ETA proposes a dual approach to self-driving with large models, enhancing efficiency through forward-thinking [13]. - InvRGB+L introduces inverse rendering techniques for complex scene modeling [14]. - AD-GS and BézierGS focus on object-aware scene reconstruction and dynamic urban scene reconstruction, respectively [18][19]. End-to-End & Trajectory Prediction - Epona presents an autoregressive diffusion world model for autonomous driving, enhancing trajectory prediction capabilities [25]. - World4Drive introduces an intention-aware physical latent world model for end-to-end autonomous driving [30]. - MagicDrive-V2 focuses on high-resolution long video generation for autonomous driving with adaptive control [35]. Occupancy Networks - The article discusses advancements in 3D semantic occupancy prediction, highlighting the transition from binary to semantic data [44]. - GaussRender and GaussianOcc focus on learning 3D occupancy with Gaussian rendering techniques [52][54]. Object Detection - Several papers address 3D object detection, including MambaFusion, which emphasizes height-fidelity dense global fusion for multi-modal detection [64]. - OcRFDet explores object-centric radiance fields for multi-view 3D object detection in autonomous driving [69]. Datasets - The ROADWork Dataset aims to improve recognition and analysis of work zones in driving scenarios [73]. - Research on driver attention prediction and motion planning is also highlighted, showcasing the importance of understanding driver behavior in autonomous systems [74][75].
政策、市场、技术三重共振 东土鸿道操作系统迎商业化落地窗口期
Sou Hu Wang· 2025-07-24 08:26
Group 1 - The core viewpoint of the articles highlights that China is entering a critical window for the commercialization of humanoid robots, with a predicted surge in applications by the second half of 2025, driven by government support and increasing commercial orders [1][2][3] - Morgan Stanley's report indicates that the Chinese government has unprecedented policy support for the embodied intelligence industry, aiming to cultivate a trillion-level industry cluster by 2027 [1] - Recent commercial orders, such as a 90.51 million yuan procurement project by UBTECH and a 124 million yuan project won by Zhiyuan Robotics and Yushu Technology, signify the industry's transition into a phase of commercial validation [1] Group 2 - The Hongdao AI robot operating system, developed by Dongtu Technology, is positioned to become a core engine for the industry's growth due to its unique technical architecture and ecological advantages [1][2] - Analysts predict three major opportunities for the Hongdao operating system in the second half of the year: benefiting from policy incentives, validating system stability and performance through large-scale order deliveries, and accelerating technological iterations [1] - The operating system's design reflects the next wave of robotics systems, allowing for parallel execution of AI reasoning and motion control on the same hardware platform, significantly reducing system complexity and costs [2] Group 3 - By 2050, it is predicted that China could have 302.3 million humanoid robots, creating a trillion-level market, while the U.S. is expected to have only 77.7 million [2] - The Hongdao operating system is building a "Hongdao ecosystem" through its microkernel architecture and rich development ecosystem, which is expected to become a standard configuration for Chinese robots in global markets [2] - The commercialization process of full-stack capable operating system vendors like Hongdao will not only impact their own development but also determine China's ability to hold core technological discourse in the trillion-level human-machine collaboration industry [3]
一起做些有意思的事情!自动驾驶之心还缺几位合伙人
自动驾驶之心· 2025-07-23 02:12
Group 1 - The article discusses the recruitment of business partners for the "Autonomous Driving Heart" initiative, aiming to onboard 10 outstanding partners (individuals and enterprises) for various autonomous driving projects [2] - The main focus areas for potential partners include large models, multimodal models, diffusion models, and other advanced AI technologies related to autonomous driving [2] - The article outlines the requirements for applicants, emphasizing a master's degree or higher from universities ranked within QS200, with a preference for candidates with significant contributions to top conferences [2] Group 2 - The article highlights the benefits for partners, including resource sharing for job placements, PhD recommendations, and study abroad opportunities [3] - It mentions attractive cash incentives and opportunities for collaboration on entrepreneurial projects [3] - Contact information is provided for interested parties to inquire about collaboration in autonomous driving projects [3]
多模态大模型存在「内心预警」,无需训练,就能识别越狱攻击
机器之心· 2025-07-21 08:43
Core Viewpoint - The rise of multimodal large models (LVLMs) has led to significant advancements in tasks such as image-text question answering and visual reasoning, but they are more susceptible to "jailbreaking" attacks compared to pure text models [2][5]. Group 1: Multimodal Model Security Challenges - LVLMs, such as GPT-4V and LLaVA, integrate images and text, enhancing their capabilities but also exposing them to security vulnerabilities [2]. - Existing methods to enhance model security, including cross-modal safety fine-tuning and external discriminator modules, face challenges such as high training costs and poor generalization [3]. Group 2: HiddenDetect Methodology - Researchers from CUHK MMLab and Taotian Group introduced HiddenDetect, a novel jailbreak detection method that does not require training [5]. - The core finding is that LVLMs retain rejection signals in their hidden states even when they generate inappropriate content, particularly in intermediate layers [5][9]. Group 3: Analysis of Rejection Signals - The study constructs a "rejection semantic vector" (RV) from frequently occurring tokens that indicate refusal, allowing for the measurement of rejection signal strength across model layers [9]. - Experimental results show significant differences in rejection signal strength between safe and unsafe inputs, with intermediate layers being more sensitive to safety concerns [9][10]. Group 4: Input Type Sensitivity - The analysis reveals that different input modalities activate distinct safety pathways, with text inputs showing quicker rejection signal activation compared to image-text inputs [17][19]. - The presence of visual modalities can delay the model's rejection response, weakening its safety mechanisms [19]. Group 5: Experimental Results and Effectiveness - The HiddenDetect method was evaluated across multiple mainstream LVLMs, demonstrating robust performance against various attack types while maintaining good generalization capabilities [23]. - The method achieved high detection effectiveness, with the proposed approach outperforming existing methods in terms of robustness and generalization [24]. Group 6: Future Directions - The research emphasizes the importance of safety in deploying large models in real-world applications and aims to expand the capabilities of the detection method while exploring the relationship between modality information and model safety [28].
大模型面经 - 快手快 Star
自动驾驶之心· 2025-07-20 08:36
作者 | 小森 编辑 | 自动驾驶之心 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 二面仍是对于论文的详细拷打,看来面试官比较看重论文,八股倒是问的比较简单。场景题也比较烦,面试官会在给定的方案上提出未解决的问题, 要一步一步完善方案 三面 本文只做学术分享,如有侵权,联系删文 部门与岗位:MMU - 【快Star】多模态大模型 一面 8. 代码:32. 最长有效括号 一面论文问的比较细致,对于没有提到的细节面试官还会询问确认,但是八股问的还是比较常规的,就是概率题有点烦 二面 原文链接: https://zhuanlan.zhihu.com/p/1928556109037281822 1. 自我介绍,问实习和论文,对于 CV 的论文进行了深入的探讨,尤其对于引入 Diffusion 十分感兴趣,从 motivation 到 method 再到 result 顺下来 的,花了比较长的时间 2. 了解哪些多模态大模型,简要介绍一下吧,目前主流的多模态大模型的范式是什么样的 3. 在 BLIP-2 或者 ...
ACM MM 2025 | EventVAD:7B参数免训练,视频异常检测新SOTA
机器之心· 2025-07-20 03:11
来自北京大学,清华大学的研究团队联手京东(JD.com)在 ACM MM 2025 发表了一种以事件为中心低成本高效的 Training-Free 视频异常检测框架 EventVAD,论文第一作者邵轶骅目前为北京大学学术访问学生,项目负责人为来自京东 (JD.com)的算法研究员马傲,目前代码和数据已全面开源。 现有视频异常检测(Video Anomaly Detection, VAD)方法中,有监督方法依赖大量领域内训练数据,对未见过的异常场景泛 化能力薄弱;而无需训练的方法虽借助大语言模型(LLMs)的世界知识实现检测,但存在细粒度视觉时序定位不足、事件 理解不连贯、模型参数冗余等问题。 为此,来自北大、清华和京东(JD.com)的研究团队提出了一种全新的视频异常检测框架 ——EventVAD。该框架通过动态 图架构与多模态大模型(MLLMs)的时序事件推理结合,在减少模型参数的同时,显著提升了异常检测的精度和效率。实验 结果显示,EventVAD 在 UCF-Crime 和 XD-Violence 两大数据集上均超越现有 SOTA 方法,成为无需训练场景下的新标杆。 论文标题:EventVAD: Tra ...