Workflow
多模态大模型
icon
Search documents
NeurIPS 2025 Spotlight | 条件表征学习:一步对齐表征与准则
机器之心· 2025-10-15 02:54
本文第一作者为四川大学博士研究生刘泓麟,邮箱为 tristanliuhl@gmail.com ,通讯作者为四川大学李云帆博士后与四川大学彭玺教授。 一张图片包含的信息是多维的。例如下面的图 1,我们至少可以得到三个层面的信息:主体是大象,数量有两头,环境是热带稀树草原(savanna)。然而,如果 由传统的表征学习方法来处理这张图片,比方说就将其送入一个在 ImageNet 上训练好的 ResNet 或者 Vision Transformer,往往得到的表征只会体现其主体信息, 也就是会简单地将该图片归为大象这一类别。这显然是不合理的。 图 1 :传统表征学习(上)与条件表征学习(下)的比较。传统的表征学习方法只能学习到一种通用的表征 ,忽略了其他有意义的信息;文章提出的条件表征学习能够基于指定准则,得到该准则下表现 力更强的条件表征,适应多种下游 任务。 此外,在各大电商平台,用户通常根据不同的标准(例如颜色、材质或场合)搜索商品。例如,用户今天可能搜索 "红色连衣裙",明天搜索 "正装",后天搜索某 个全新的关键词。这对于拥有庞大规模商品的平台来说,手动打标签是不现实的,而传统的表征学习也仅仅只能获取到 ...
国内20家公司大模型岗位面试经验汇总
自动驾驶之心· 2025-10-14 23:33
作者 | 林夕@知乎 来源 | 青稞AI 原文链接: https://zhuanlan.zhihu.com/p/690801254 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 面试情况 投过的公司 :淘天,字节,蚂蚁,商汤,美团,夸克,腾讯,minimax,零一万物,阿里控股,潞晨科技,阿里巴巴国际,网易实验室,Momenta。 Offer :淘天,字节AML,商汤,蚂蚁,美团,夸克,腾讯混元,天翼云。 以下是面经分享 淘天【offer】 部门:未来生活实验室 介绍: 淘天集团的大模型研究将主要围绕两个场景展开:一是搜广推,二是逛逛的内容化。团队组建工作由淘天集团CEO戴珊、淘天集团CTO若海、阿里妈妈 CTO郑波等人共同牵头。 面经 一面: HR面: 面试体验 面试体验很好。HR也没有那么咄咄逼人。阿里味不是特别明显。最终权衡也选择来了淘天,有兴趣来我们这边的欢迎投递简历,有卡(****张)。 字节AML【offer】 部门:AML火山方舟大模型 介绍: 淘天集 ...
浙商早知道-20251015
ZHESHANG SECURITIES· 2025-10-14 23:30
Market Overview - On October 14, the Shanghai Composite Index fell by 0.62%, the CSI 300 decreased by 1.2%, the STAR 50 dropped by 4.26%, the CSI 1000 declined by 1.95%, the ChiNext Index fell by 3.99%, and the Hang Seng Index decreased by 1.73% [4] - The best-performing sectors on October 14 were banking (+2.51%), coal (+2.18%), food and beverage (+1.69%), transportation (+0.5%), and utilities (+0.49%). The worst-performing sectors were telecommunications (-4.98%), electronics (-4.64%), non-ferrous metals (-3.66%), computers (-2.98%), and electrical equipment (-2.36%) [4] - The total trading volume for the A-share market on October 14 was 25,966 billion, with a net inflow of 8.603 billion HKD from southbound funds [4] Key Insights Cosmetic Industry - The cosmetic market is expected to continue low single-digit growth in Q4, with brand differentiation increasing. New consumer brands are recommended as they have upward momentum and room for valuation switching towards 2026 [5] - New consumer brands are anticipated to achieve a compound annual growth rate of 20%-30% in revenue and profit over the next 2-3 years, maintaining attractiveness in terms of market conditions and certainty [6] Computer Industry - The rise of domestic computing power and the application of AI are highlighted as key trends. The large-scale implementation of large language models is still pending breakthroughs [9] - The domestic computing power supply chain is gradually taking shape, driven by revenue growth from domestic computing power manufacturers like Cambrian. The acceleration of multimodal large model applications is expected to lead to commercial implementation in the video sector [9] - The market perceives that large-scale model implementation still faces challenges, but advancements like Sora 2.0 are expected to break through physical simulation barriers, potentially generating commercial value in video generation [10]
NeurIPS 25 | 中大&UC Merced等开源RAPID Hand,重新定义多指灵巧手数据采集
机器之心· 2025-10-14 08:24
| Zhaoliang Wan- Zetong Bi1 Zida Zhou2 Hao Ren1 Yiming Zeng1 Yihan Li1 | | | | | --- | --- | --- | --- | | Lu Oi3 | Xu Yang4 | Ming-Hsuan Yang3 | Hui Cheng1 * | 论文标题:RAPID Hand: A Robust, Affordable, Perception-Integrated, Dexterous Manipulation Platform for Generalist Robot Autonomy 论文地址:https://www.arxiv.org/abs/2506.07490 项目主页:https://rapid-hand.github.io/ 灵巧操作能力是通用机器人实现多任务泛化的核心能力之一。无论是日常的家庭整理、物品归置,还是辅助类服务任务,若缺乏灵巧的操作能力,机器人便难以 真正完成复杂交互。 近年来,随着多模态大模型(VLMs)在机器人控制中的逐步应用,研究者们开始将高质量的操作演示与预训练模型结合,用于具身推理与通用操作策略学 ...
上海网达软件股份有限公司 关于2025年半年度业绩说明会召开情况的公告
Core Viewpoint - The company held a half-year performance briefing on October 13, 2025, to discuss its technological advantages and future plans in the context of the current market environment [1]. Group 1: Technological Advantages - The company has developed a comprehensive HD video solution that integrates intelligent encoding and decoding technology, low-latency processing architecture, and AI deep analysis, achieving a transmission delay of 60ms in low-bandwidth environments [1][2]. - In the AI sector, the company focuses on security applications, creating specialized models that understand industry knowledge, and has successfully implemented intelligent analysis of 4K ultra-high-definition monitoring videos [2]. - The company is advancing its media production capabilities by integrating AIGC content generation and intelligent agent collaboration, enhancing content dissemination efficiency and digital marketing [2]. Group 2: R&D Investment and Future Directions - The company plans to focus on generative AI and its integration with video applications, emphasizing collaborative innovation in video encoding, editing, and recognition [5][6]. - Future R&D will target specific industry models, AIGC applications, and XR technologies, ensuring a balance between cost and benefit in R&D investments [6]. Group 3: Market Engagement and Strategic Initiatives - The company is actively participating in the low-altitude economy sector, developing intelligent inspection systems for drones and unmanned vehicles, which align with national strategic needs [7]. - The company has implemented a cash dividend policy, distributing 1.50 yuan per 10 shares to shareholders, and will continue to balance short-term returns with long-term growth [8]. Group 4: R&D Expenditure and Efficiency - The company reported a decrease in R&D expenses due to a strategic focus on AI technology and optimization of high-end video product lines, while reducing investments in mature and non-core areas [9]. Group 5: AI Safety Supervision Developments - The company has made advancements in AI-driven digital safety supervision systems, integrating multi-source data for dynamic perception and risk assessment in various operational scenarios [10].
Sora 2发布,进一步拉动算力、存储需求 | 投研报告
Core Viewpoint - The electronic sector experienced a decline this week, with the Shanghai and Shenzhen 300 index down by 0.51% and the electronic sector down by 2.63%, while the semiconductor industry saw a decrease of 3.28% [2] Semiconductor Equipment and Materials - Major domestic foundry SMIC maintains capital expenditure at $7-8 billion per year [2] - Changchun Integrated Circuit was established with a registered capital of 20.72 billion yuan on September 5 [2] - Changxin Technology's IPO guidance status changed to "guidance acceptance" on October 10 [2] - Longchuan Technology expects Q3 net profit to be between 400-450 million yuan, a year-on-year increase of 180.67%-215.75% [2] - Domestic semiconductor materials are highly dependent on imports in areas like photoresists and high-end precursors, but domestic production is steadily advancing [2] - Dinglong Co. forecasts Q3 net profit of 190-220 million yuan, a year-on-year increase of 19.89%-38.82% [2] Integrated Circuit Packaging and Testing - The packaging and testing industry is one of the most localized segments of the semiconductor supply chain and is rapidly developing with technological upgrades [2] - Advanced packaging is becoming a key path for performance enhancement, driven by emerging applications like AI and HPC, which are increasing demand for high-end packaging [2] Chip Design - Overseas market demand is recovering in consumer electronics, enterprise, communications, and industrial sectors, while the automotive market has not yet shown signs of recovery [3] - Domestic market demand is benefiting from policy stimulation and the rise of new energy vehicle brands, with industrial demand continuing to recover and automotive demand being relatively strong [3] - AI development is driving demand for CPUs, GPUs, and high-performance storage chips, with leading global cloud companies actively developing self-researched ASICs [3] - Chip Yuan Co. expects Q3 revenue of 1.284 billion yuan, a record high for a single quarter, with year-on-year and quarter-on-quarter growth of 78.77% and 119.74% respectively [3] Investment Recommendations - OpenAI released the latest audio-video generation model Sora2, which topped the App Store free app chart shortly after launch [4] - Alibaba announced plans to invest 380 billion yuan in AI infrastructure at the Cloud Summit in September [4] - The development of multimodal large models is expected to further drive demand for computing power [4] - Recommendations include focusing on Chip Yuan Co., Cambricon, Haiguang Information, SMIC, and Hua Hong Semiconductor [4] - Domestic storage manufacturers are expected to contribute significantly to capital expenditure for domestic wafer fabs next year, with recommendations to focus on companies like Zhongwei, Tuojing Technology, Beifang Huachuang, Longchuan Technology, and Anji Technology [4] - The growth of AI is anticipated to drive an upward cycle in the storage chip industry, with recommendations to pay attention to companies like Zhaoyi Innovation, Beijing Junzheng, and Lanke Technology [4]
Sora2发布,进一步拉动算力、存储需求
Yin He Zheng Quan· 2025-10-13 08:36
Investment Rating - The report maintains a "Recommended" investment rating for the semiconductor industry [1]. Core Insights - The release of Sora 2 by OpenAI is expected to further drive demand for computing power and storage [3]. - The semiconductor industry is experiencing a rapid development phase, with significant contributions from domestic storage manufacturers to capital expenditures in wafer foundries [3]. - The advanced packaging segment of the semiconductor industry is becoming increasingly important, driven by new applications in AI and high-performance computing (HPC) [3]. - The demand for digital chips is being propelled by the growth of AI, with a notable increase in the need for CPUs, GPUs, and high-performance storage chips [3]. - The report highlights the potential for a cyclical upswing in the storage chip industry due to advancements in AI [3]. Summary by Sections Semiconductor Equipment and Materials - Major domestic foundry SMIC maintains capital expenditures at USD 7-8 billion per year [3]. - Longxin Technology's IPO guidance status has changed to "Acceptance of Guidance" [3]. - Longchuan Technology expects a net profit of RMB 400-450 million for Q3, a year-on-year increase of 180.67%-215.75% [3]. - Domestic semiconductor materials are gradually achieving localization, with companies like Dinglong Co. forecasting a net profit of RMB 190-220 million for Q3, a year-on-year increase of 19.89%-38.82% [3]. Integrated Circuit Packaging and Testing - The packaging and testing sector is experiencing rapid growth and technological upgrades, with advanced packaging becoming a key path for performance enhancement [3]. Analog and Digital Chip Design - The recovery in demand from consumer electronics, enterprise markets, and industrial sectors is noted, while the automotive market has not yet shown signs of recovery [3]. - The report emphasizes the emergence of a "GPU+ASIC" heterogeneous computing model, driven by major cloud providers' investments in self-developed ASICs [3]. Investment Recommendations - The report suggests focusing on companies such as Chipone Technology, Cambrian, and SMIC due to their potential in AI infrastructure and storage chip sectors [3].
一些项目合作,待遇open~
具身智能之心· 2025-10-13 04:02
Core Insights - The company aims to empower partners and small businesses in various areas such as solution development, data collection, technology upgrades, and corporate training [1] - The company is inviting global practitioners in the embodied intelligence field to collaborate in technical services, training, course development, and research guidance [1] Company Overview - The company, "Embodied Intelligence Heart," is a leading creative platform in the domestic embodied intelligence sector, offering services that include online education, offline training, corporate consulting, promotional services, hardware R&D, and solution provision [3] Main Directions - The focus areas include but are not limited to: VLA, VLN, Diffusion Policy, Reinforcement Learning, VLA+RL, remote operation, motion capture, sim2real, multimodal large models, simulation, motion control, end-to-end systems, and 3D perception [5] Job Description - The positions are primarily aimed at embodied course development, solution R&D, hardware development, and training collaboration, targeting B-end clients such as enterprises, universities, and research institutes, as well as C-end clients including students and job seekers [6] Contact Information - Interested parties can add WeChat oooops-life for further inquiries [7]
智驾最后的窗口期,冲出AI新玩家
远川研究所· 2025-10-12 13:04
Core Insights - The intelligent assisted driving industry has experienced a stark contrast over the past year, with advancements in technology leading to increased consumer demand and cost reductions, allowing L2+ systems to penetrate the mid-to-low-end market [2][4][5] - The competitive landscape is intensifying, with a clear emergence of leading players, and companies must adapt to new technological paradigms to remain relevant [2][9] - The rise of multi-modal large models and end-to-end systems is reshaping the industry, with companies like Qianli Technology positioning themselves strategically to leverage these advancements [12][21] Industry Dynamics - The shift from modular to end-to-end architectures in intelligent driving systems is becoming a standard, as exemplified by Tesla's FSD V9.0, which emphasizes a pure vision-based approach [4][5][6] - The software value in intelligent driving systems is projected to exceed 40% of the total vehicle value, indicating a significant shift in the industry's focus towards software-driven solutions [6][18] - The competitive landscape is characterized by a mix of vertically integrated companies like Tesla and third-party suppliers, highlighting the importance of collaboration and resource integration [9][18] Company Developments - Qianli Technology, founded by AI pioneer Yin Qi, aims to become a platform-level AI company, focusing on intelligent assisted driving and smart cockpit solutions [11][21] - The company has established partnerships with major automotive players, including Geely, to enhance its market presence and technological capabilities [17][25] - Qianli Technology's RLM (Reinforcement Learning-Multi-modal) model is gaining attention for its ability to improve driving experience and safety through advanced perception and decision-making capabilities [21][24] Future Trends - The integration of multi-modal large models and reinforcement learning is expected to be crucial for the future of intelligent driving systems, enhancing their adaptability and safety [20][22] - The global market for automated and intelligent driving vehicles is projected to reach $1.2 trillion by 2040, with significant growth opportunities for companies like Qianli Technology [25] - The development of Robotaxi services is a key focus for Qianli Technology, aiming to establish a comprehensive operational framework within 18 months [27]
抖音&LV-NUS开源多模态新模,以小博大刷新SOTA,8B推理比肩GPT-4o
量子位· 2025-10-12 07:30
SAIL-VL2团队 投稿 量子位 | 公众号 QbitAI 2B模型在多个基准位列4B参数以下开源第一。 抖音SAIL团队与LV-NUS Lab联合推出的多模态大模型 SAIL-VL2 。 SAIL-VL2 以2B、8B等中小参数规模, 在 10 6个数据集 实现性能突破 ,尤其在MMMU、MathVista等 复杂推理 基准超越同规模模型,甚 至比肩更大参数的闭源模型。 方法上,SAIL-VL2通过 数据、训练、架构 三大维度的创新,为社区提供"小模型也能有强能力"新范式。 SAIL-VL2既具备细粒度视觉感知能力,又能在复杂推理任务中媲美更大规模模型。同时,团队通过开源模型与推理代码,提供可扩展的多模 态基础模型。 Pretrain:三大核心创新 MoE架构:参数与计算的平衡 架构层面:稀疏MoE+灵活编码器,平衡性能与效率 SAIL-VL2突破传统稠密LLM的架构,引入稀疏混合专家 (MoE) ,并提供多规格模型配置,满足不同场景需求: | Model | Vision Encoder Language Model | #Param | | | --- | --- | --- | --- | | | ...