多模态大模型
Search documents
赛道分化加剧,2026年人工智能最强风口来袭
3 6 Ke· 2025-12-03 08:57
不再是"AI+"的修修补补,而是AI原生重构系统底层逻辑;不再局限于数字世界的生成与理解,而是物理AI打通虚拟与现实的行动闭环;不再是单一模态 的孤军奋战,而是多模态技术融合万象;更有世界模型让AI从"数据应答"走向"规律预判"。 这场关乎技术架构、应用形态与认知高度的变革已然来临,谁将成为重塑产业、定义未来的最强风口? AI原生引发系统应用底层革命 当算法模型的迭代速度超越行业想象边界,当AI从屏幕后的工具跃变为渗透现实的"参与者",2026年将成为人工智能发展的关键分水岭。 如果说"AI+"是在现有系统上"打补丁"或"外挂"AI功能,那么AI原生则意味着以AI为系统设计的底层逻辑与能力中枢,这套系统为AI而生、因AI而长,驱 动从技术架构、业务流程、组织角色到价值创造方式的全方位重塑。 这种变革并非简单的功能叠加,而是以生成式AI为核心重构开发范式,让智能成为应用的原生属性而非附加能力。从"AI+"走向"AI原生",正成为AI未来 发展的关键方向。 | 维度 | 传统"Al+"架构 | AI原生架构 | | --- | --- | --- | | 设计起点 | 现有业务流程 | Al能力边界 | | 数据 ...
国内首款AI助盲眼镜发布:300ms超低延迟 接入通义千问
Feng Huang Wang· 2025-12-03 07:14
据统计,我国现有视障人士超1700万,由于缺乏除盲杖以外的高效辅助工具,出行高度依赖人工,导致 许多视障者选择"少出行"。杭州瞳行科技市场及技术总监陈刚表示,大模型技术为行业带来了转机,算 力成本已降至此前的十分之一。通过"基模复用+微调优化"的技术路径,企业能以更低的门槛快速实现 包括语音助手、一键求助亲友在内的复杂功能。目前,该款AI助盲眼镜已正式面市。 该助盲眼镜硬件端配备了121度超广角双摄像头,整体套件由眼镜主体、手机、遥控指环及盲杖四部分 构成。在核心的算力与算法层面,技术团队针对不同使用场景进行了模型调优:在移动避障场景下,系 统可实现300ms的超低延迟,即用户每迈出一步,眼镜即可完成一次环境分析与道路提示,仅对路牌、 车辆等关键障碍物进行简要概括;而在阅读菜单或寻找店铺等场景中,大模型则会切换策略,对文本和 环境细节进行详细总结与播报。 凤凰网科技讯 12月3日 在国际残疾人日之际,杭州瞳行科技正式发布了国内首款基于多模态大模型的 AI助盲眼镜。该产品接入了通义千问Qwen-VL及OCR系列模型,旨在通过"视觉模型+硬件"的组合,解 决视障群体在出行导航中经常面临的"最后十米"寻址难题。 ...
CES2026超前瞻:AI是核心议题,中国企业或将再度霸展
3 6 Ke· 2025-12-01 04:09
Core Insights - CES 2026 is set to showcase significant advancements in AI technology, with major companies like Siemens, Caterpillar, AMD, and Lenovo focusing on AI in their presentations [5][8][19] - The event will highlight a variety of AI hardware products, including AI glasses, AI PCs, AI smartphones, and humanoid robots, indicating a strong trend towards AI integration in consumer electronics [18][19] - Chinese brands are expected to dominate CES, showcasing their technological innovations across various categories, reflecting their growing influence in the global market [40][41] AI as the Central Theme - AI will be the overarching theme of CES 2026, with confirmed keynote speeches from industry leaders emphasizing its importance [5][19] - Companies like Siemens will demonstrate how AI and digital twin technology can transform manufacturing and infrastructure [8] - Lenovo plans to unveil innovations related to AI-driven experiences, including applications in sports and personalized user interactions [11] PC and Gaming Innovations - Intel, AMD, and NVIDIA are anticipated to launch new products, including Intel's Panther Lake mobile processors and AMD's R9 9950X3D processor with enhanced cache capabilities [19][21] - The introduction of new gaming processors and graphics cards is expected to attract significant attention from the gaming community [21][22] Display Technology Competition - Major TV manufacturers, including TCL and Hisense, are expected to showcase advancements in RGB display technology, competing with international brands like LG and Samsung [25][26] - The CES 2026 will feature a variety of display technologies, including Micro RGB LCD and Mini LED, highlighting the competitive landscape in the display sector [25][26] Smart Cleaning Devices - Chinese smart cleaning brands are set to unveil new products, including robotic vacuums and lawn mowers, reinforcing their leadership in the global smart cleaning market [27][30] - The focus will be on comprehensive cleaning solutions that leverage AI and advanced navigation technologies [30] Accessory and Audio Innovations - Accessory brands like Baseus and Ugreen are expected to expand their product lines beyond traditional charging devices, venturing into audio and smart home solutions [31][34] - The introduction of high-end audio products and smart home security devices will be a key focus for these brands at CES 2026 [36] AI Glasses and New Hardware - AI glasses are anticipated to be a major highlight, with various brands competing in this emerging category [38] - The presence of established players and new entrants in the AI hardware space will create a dynamic showcase of innovative products [39] Chinese Brands' Dominance - Chinese companies are projected to play a pivotal role at CES, with a significant share of exhibitors and a focus on technological innovation rather than just cost competitiveness [40][41] - The event serves as a platform for Chinese brands to demonstrate their rapid product development and engineering capabilities across multiple tech sectors [40][41]
图解Qwen3-VL多模态模型
自动驾驶之心· 2025-11-29 02:06
阿杰 | 十年技术老兵:曾深耕大数据建模、后端架构设计与算法优化,经手过千万级用户系统。这里分享技术实战干货、踩坑复盘与行业趋势解读,陪开发 者一起成长。 作者 | 阿杰不敲代码时 来源 | 阿杰不敲代码时 原文链接: 图解Qwen3-VL多模态模型 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 以下文章来源于阿杰不敲代码时 ,作者阿杰不敲代码时 阿杰不敲代码时 . 本文只做学术分享,如有侵权,联系删文 前面不久 ,写了一篇关于VLM的文章,不知道是不是内容不好还是标题的原因,导致大家好像不是很感兴趣,但是如果要知道Qwen3-VL的内部细节。如果基础不怎 么牢固或者没有基础,那一篇还是需要看看的,当然我也是认为大家看了那篇,才来看这边哈,这里也就不在重复一些知识了。不排除有些大佬可能有基础,跳过第 一篇来看这个,也是可以。如果写的有不对的地方,也欢迎大家指正与批评。 视觉语言模型 (VLM) 是自回归 AI 模型,可将文本和图像处理为输入。在这一篇文章中我们也会详细的从源码来看Qwen3-VL模型怎么 ...
游戏板块早盘震荡走强,游戏ETF(159869)现涨近1%
Mei Ri Jing Ji Xin Wen· 2025-11-27 04:34
Group 1 - The gaming sector is experiencing a strong upward trend, with the gaming ETF (159869) rising nearly 1% in early trading on November 27, 2023, driven by leading stocks such as Giant Network, Kaixin Network, and Youzu Network [1] - Citic Securities reports that the gaming industry continues to show high growth in revenue and profit in Q3 2025, supported by leading companies and a regular issuance schedule of game licenses [2] - The gaming sector is expected to benefit from AI, content, and commercialization model transformations, with the gaming ETF (159869) tracking the performance of A-share listed companies in the animation and gaming industry [2] Group 2 - Google's release of the Nano Banana Pro showcases its strong capabilities in the multimodal large model field, integrating advanced understanding and rendering capabilities, which can enhance content creation across various industries [1] - The Nano Banana Pro supports 2K and 4K resolutions, catering to professional production needs, and reflects a broader trend of improving multimodal capabilities and decreasing usage barriers in the market [1]
资深模型专家解读谷歌 Gemini
2025-11-26 14:15
Summary of Key Points from the Conference Call Company and Industry Overview - The conference call primarily discusses **Google's Gemini 3 Pro**, a state-of-the-art multimodal AI model that showcases significant advancements in visual understanding and processing capabilities across various data types including text, images, audio, video, and code [1][2][4][5]. Core Insights and Arguments - **Performance and Innovation**: Gemini 3 Pro is recognized as the world's strongest visual understanding model, leading in 20 out of 21 evaluation dimensions. It introduces the **Deepseek mode** to reduce hallucination rates and employs the **Mamba principle** to optimize the relationship between Transformer inference power and sequence length, enhancing the processing of long series data [2][4][7]. - **Training Methodology**: The model is trained on **14TB of data** using a GPU-based adaptive intelligent optimization paradigm. It utilizes a segmented training approach combined with reinforcement learning and test-time strategies to improve abstract reasoning capabilities [4][5]. - **Multimodal Capabilities**: Gemini 3 Pro is designed as a native multimodal model, capable of unified encoding and processing of various data types. This design allows for powerful multimedia content generation and understanding, significantly enhancing user experience [5][6]. - **Comparative Performance**: While Gemini 3 Pro excels in humanities and emotional intelligence dimensions, it does not surpass competitors like Claude 4.5 in programming capabilities, where Claude scores **80.9** compared to Gemini's lower performance [2][7]. Additional Important Insights - **Challenges in Asian Markets**: Overseas models struggle with processing Chinese content due to a lack of focus on Eastern elements during development, leading to issues in accurately displaying Asian language characters. This presents a barrier for these models in the Chinese market [9][12]. - **Technological Advantages of TPU**: Google’s use of its proprietary TPU chips for large-scale model training offers advantages such as lower costs, higher energy efficiency, and greater memory capacity compared to competitors using NVIDIA GPUs [10][16]. - **Future Competitive Landscape**: The AI landscape is evolving into a three-way competition among Google, Grok, and OpenAI. While Google currently leads, it is anticipated that Grok may close the gap, with OpenAI also showing potential in multimodal capabilities [10][11]. - **Knowledge Graphs and AI Hallucination**: Knowledge graphs are being explored as a means to reduce AI hallucination rates by providing verified information, although widespread application remains a challenge due to data acquisition costs and industry-specific requirements [21]. Conclusion - Google’s Gemini 3 Pro sets a new standard in the AI industry with its comprehensive capabilities and innovative training methods. However, challenges remain in addressing language processing for Asian markets and maintaining competitive advantages against emerging rivals.
瑞芯微上线RK182X系列AI协处理器
Ju Chao Zi Xun· 2025-11-26 13:10
Core Insights - The launch of the RK182X series by Ruixinwei positions the company in the high-performance AI co-processor market, targeting local AI inference tasks through high-speed interconnects [1][3] - The RK182X series integrates multi-core high-performance NPU, supporting local deployment of large language models (LLM) with 3B/7B parameters, enhancing capabilities in processing multi-modal data [3][4] - The innovative 3D stacking of logic chips and memory in the RK182X series allows for a theoretical bandwidth of up to 1TB/s, significantly improving local model inference throughput [3][4] Product Features - The RK182X series features built-in 2.5GB or 5GB high-bandwidth DRAM, enabling compact system design and higher bandwidth [3] - It connects to host systems via PCIe 2.0 or USB 3.0, allowing for easy integration into existing architectures without major modifications, thus lowering the entry barrier for local AI model adoption [3][4] Market Trends - The introduction of the RK182X series aligns with the rising demand for edge computing capabilities and the implementation of multi-modal large models in the industry [4] - The product's development reflects a shift from general-purpose SoCs to specialized AI co-processors among domestic chip manufacturers, indicating a trend towards more tailored solutions in the AI sector [4]
具身方向,论文“救援”来了!
具身智能之心· 2025-11-26 10:00
Core Viewpoint - The article promotes a comprehensive thesis guidance service that addresses various challenges faced by students in research and writing, particularly in advanced fields like multimodal models and robotics. Group 1: Thesis Guidance Service - The service offers one-on-one customized guidance in cutting-edge research areas such as multimodal large models, visual-language navigation, and embodied intelligence [1][2]. - It provides a full-process closed-loop support system, covering topic innovation, experimental design, code debugging, writing, and submission strategies to help produce high-quality results quickly [2]. - The guidance is provided by a team of experienced mentors from prestigious institutions like CMU, Stanford, and MIT, with expertise in top-tier conferences [1][3]. Group 2: Dual Perspective Approach - The service emphasizes both academic publication and practical application, focusing on real-world value such as improving the robustness of robotic grasping and optimizing navigation in real-time [3]. - Students consulting in the top 10 inquiries can receive free matching with dedicated mentors for in-depth analysis and tailored publication advice [4].
具身智能之心技术交流群成立了!
具身智能之心· 2025-11-26 10:00
Group 1 - The establishment of a technical exchange group focused on embodied intelligence, covering areas such as VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, VLA+RL, sim2real, multimodal large models, simulation, motion control, target navigation, mapping and localization, and navigation [1] - Interested individuals can add the assistant's WeChat AIDriver005 to join the community [2] - To expedite the joining process, it is advised to include a note with the institution/school, name, and research direction [3]
七牛智能升5% 公司专注多模态大模型 上半年AI相关收入已达1.84亿元
Zhi Tong Cai Jing· 2025-11-25 03:28
Core Viewpoint - Qiniu Intelligent (02567) has seen a 5% increase in stock price, reaching HKD 0.63, driven by its integrated MPaaS technology and focus on AI capabilities [1] Group 1: Company Strengths - The company possesses key technologies for one-stop scenario-based audio and video solutions, including audio and video technology, low-code platforms, and AI capabilities, due to years of technological accumulation [1] - With the integration of AIGC technology, the company aims to focus on multimodal large models and empower its APaaS business through scenario-based development to meet customer needs [1] Group 2: Financial Performance - In the first half of this year, Qiniu Intelligent's AI-related revenue reached CNY 184 million, accounting for 22.2% of total revenue, primarily from AI inference services and computing resource leasing [1] - By August 2025, the number of developers on the Qiniu Intelligent platform is expected to exceed 1.69 million, with a continuous increase in new registrations [1] Group 3: Market Expansion - The company plans to accelerate its overseas business expansion to enhance its market share in international markets [1] - The demand for AI application development's inference computing power is continuously rising, with the number of AI-related users quickly increasing to 15,000 [1]