Workflow
多模态大模型
icon
Search documents
AI+教育,一个被远远低估的赛道
Feng Huang Wang· 2025-09-29 12:29
摘要: 真正高阶的AI老师开始出现了。 2024年5月,GPT-4o发布的夜晚,许多教育赛道的从业者难眠,因为在那场发布中,可汗学院创始人受 邀演示实时语音辅导初中数学几何题。过程中,GPT-4o直接化身在线老师,给教育行业很大冲击。 一年多过去了,教育行业反倒是松了一口气,因为OpenAI再没在AI+教育上使劲发力。通用大模型不再 是行业强劲的对手,相反,他们激发了AI+教育市场的潜力,来自教育赛道的多方势力,开始暗自投 入。 在今年9月24日开幕的云栖大会上,好未来旗下九章爱学多场景智慧教育解决方案与学而思学习机T4旗 舰款等产品分别在1号馆和3号馆内展出,作为在1号馆唯一出展的教育科技企业代表,展台前围聚了不 少学生家长与教育机构从业者。其展出的核心亮点之一,正是多模态能力的提升。 图|凤凰网科技 家长们只需要打开其中的AI批改、AI写作引导,就能体验各种AI教学过程,和此前的对话式AI不同, 多模态能力的引入,让学习机开始能看见、会理解,可以通过扫描学生的作业,完成实时批改,甚至能 根据纸上的书写情况,实时批改讲解、展示解题思路。 这意味着,在新一代多模态大模型的支撑下,真正高阶的AI老师开始出现,其 ...
奇多多AI学伴亮相2025云栖大会,无界方舟用AI“慧眼”开启智能早教时代
Cai Fu Zai Xian· 2025-09-29 10:24
在近日举办的2025云栖大会现场,无界方舟推出的国内首款基于「端到端实时多模态互动模型」的AI 学伴机器人——奇多多,成为全场焦点。这款产品在京东预售仅上线一周,销量便突破了10000台,这 个数字不仅体现了市场对优质AI早教产品的渴望,更预示着多模态大模型在消费级硬件领域的商业化 曙光正在到来。 正如一位现场宝妈所说:"太好了!终于不止是AI玩具了,而是解决了很多早教痛点问题。"在AI技术日 益成熟的今天,奇多多的成功或许验证了:在早教赛道,"功能驱动"比"概念炒作"更能让市场买单。更 令人瞩目的是,奇多多在展会期间,现场获得了上百位家长下单预定,同时吸引了几十家AI产品后续 对接无界方舟EVA模型的合作机会,成为本届大会最具商业潜力的AI硬件产品。 云栖大会现场火爆,奇多多展现真实力 在云栖大会3号馆·前沿应用馆奇多多星球展台,奇多多吸引了大量参展观众及宝宝驻足体验。孩子们手 拿绘本、练习、玩具、绘画作品等,与奇多多进行自然互动,现场气氛热烈。 奇多多展现出的不仅仅是语音交互能力,更是真正的多模态理解能力。它能够识别孩子手中的任意绘 本/教材/卡片等读物,无论中文、英文,甚至儿童读物复杂分散的混合排版,都 ...
曝顶级AI大牛,加入阿里通义,事关下一代大模型
3 6 Ke· 2025-09-29 09:56
Core Insights - The article discusses the recent recruitment of AI expert Steven Hoi by Alibaba's Tongyi Lab, indicating a strategic shift towards foundational research in multimodal large models [2][4][7] - Hoi's extensive background in AI, including over 20 years of experience and significant academic contributions, positions him as a key asset for Alibaba in enhancing its AI capabilities [2][4] - The move reflects Alibaba's commitment to accelerating the development of multimodal AI technologies, which are crucial for the company's competitive positioning in the global AI landscape [7][10] Group 1: Steven Hoi's Background and Role - Steven Hoi has over 20 years of experience in AI and has published more than 300 academic papers, with over 50,000 citations, making him one of the top 1% AI scientists globally [2] - He previously served as Vice President at Salesforce, where he built the AI research ecosystem in Asia from the ground up [2][4] - Hoi joined Alibaba in February 2025 as Vice President and Chief Scientist of the Intelligent Information Business Group, focusing on multimodal foundational models and applications [4] Group 2: Strategic Implications for Alibaba - Hoi's transition to the Tongyi Lab team suggests a significant talent reallocation within Alibaba, emphasizing the importance of foundational research in AI [7] - Alibaba's Tongyi Lab is currently in a critical phase of "speed of iteration" and "multimodal development," necessitating top-tier talent like Hoi to drive innovation [7][10] - The company aims to enhance its competitive edge by rapidly iterating AI models and advancing from unimodal to multimodal capabilities, which is seen as an inevitable trend in the industry [7][10] Group 3: Challenges and Opportunities in Multimodal AI - Hoi highlighted several technical challenges in developing unified multimodal models, including the scarcity of models that support full multimodal interaction and the difficulty in balancing understanding and generation across different modalities [10] - He emphasized that the era of multimodal Agent AI is just beginning, with many technical hurdles to overcome before achieving Artificial General Intelligence (AGI) [10] - The challenges present significant opportunities for growth and innovation within the multimodal AI sector, as the industry seeks to address these issues [10]
传梅卡曼德机器人秘密申请香港IPO 预计募资15.6亿港元
Zhi Tong Cai Jing· 2025-09-25 01:52
梅卡曼德在多模态大模型、成像算法、AI识别算法、机器人算法、AI软件、光/机/电核心器件等核心技 术上均积累深厚,拥有丰富的真实场景数据。 据媒体报道,全球具身智能机器人领域"独角兽"梅卡曼德机器人(Mech-Mind Robotics)已秘密提交香港 上市申请,预计募资2亿美元(合15.6亿港元)。知情人士称,这家美团投资的智能机器人公司正在与顾问 进行洽谈,但最终集资规模等细节尚未确定。 公开资料显示,梅卡曼德机器人由清华海归团队于2016年创办,致力于推动具身智能机器人无所不在的 存在。产品包括工业级3D相机、机器人编程软件、机器视觉软件等,其中通用具身智能机器人"眼脑 手",已率先实现跨行业、大规模、全球化应用。 据悉,梅卡曼德已获得来自IDG资本、美团、红杉中国、源码资本、英特尔资本、启明创投等知名投资 机构的多轮支持,累计融资额超20亿元人民币。 今年8月,梅卡曼德完成最新一轮融资,涉及金额约5亿人民币。该轮融资由雄安基金、大洋电机 (002249)、华创资本、中金保时捷基金、上河动量基金、南翔创投、海河基金、河北结构调整基金、 天创资本等投资。该轮资金将用于加速梅卡曼德具身智能"眼脑手"全栈技术 ...
百度Qianfan-VL开源,纯国产自研昆仑芯跑出世界一流
Xuan Gu Bao· 2025-09-25 00:14
百度把他们全新的视觉理解模型Qianfan-VL直接开源了。 Qianfan-VL系列一共有三个版本,3B、8B和70B,参数量从小到大,分别对应不同的应用场景。 | 模型名称 | 上下文长度 | 支持思考 | 适用场景 | | --- | --- | --- | --- | | Qianfan-VL-3B | 32k | 不支持 | 端上实时场景、OCR文字识别 | | Qianfan-VL-8B | 32k | 支持 | 服务端通用场景、微调优化场景 | | Qianfan-VL-70B | 32k | 支持 | 离线数据合成、复杂推理计算场景 | 模型从头到尾,都是在百度自己家的芯片昆仑芯P800上训练出来的。 模型的性能和应用 Qianfan-VL是一个多模态大模型,就是那种既能看懂图片又能理解文字的AI。一张复杂的图表,它能分析出里面的数据和趋势。 它最核心的两个本领是OCR(光学字符识别)和教育场景的深度优化。 你拍一张身份证,系统自动把你的姓名、证件号填好,这就是OCR。Qianfan-VL把这项能力做到了全场景覆盖,不管是印刷体、手写字,还是藏 在街边招牌、商品包装袋上的艺术字,甚至是数学卷子 ...
等了大半年的Qwen3-VL终于也开源了!
自动驾驶之心· 2025-09-24 06:35
以下文章来源于刘聪NLP ,作者刘聪NLP 刘聪NLP . 不会rap的刘聪,在这里分享着AI的flow。 作者 | 刘聪NLP 来源 | 刘聪NLP 抓着云栖大会,猛开源是吧,两天时间,开源了Qwen3-Omni系列模型、Qwen-Image-Edit-2509模型、 Qwen3-VL模型、Qwen3Guard-Gen系列模型,共计12个。 还有一些没开源的API,比如Qwen-TTS、Qwen3-Coder-Plus、Qwen3-Max、Qwen3-LiveTranslate等等等 PS: 我恨俊旸呀!天天凌晨开源~ 说实话,根本测不完,都知道我一直在等Qwen3的VL模型,其他模型先放一放,今天先来测试一波VL模 型。 先来看看模型相关内容,Qwen3-VL相较于Qwen2.5-VL有以下方面改进, vision encoder部分 ,Qwen3-VL沿用之前的VisionPatchEmbed,使用Conv3d,不过patch_size从14扩到了 16,激活函数从silu变成gelu_pytorch_tanh projector部分 ,从之前的MLP-based Projector,额外增加DeepS ...
打算招聘几位大佬共创平台(4D标注/世界模型/VLA等方向)
自动驾驶之心· 2025-09-23 23:32
QS200以内高校,硕士及以上学历,手握顶会的大佬优先。 待遇说明 自动驾驶资源共享(求职、读博、出国留学推荐等); 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 业务合伙人 自动驾驶之心业务合伙人招募来啦!我们团队今年计划向国内外招募10名优秀的合伙人,负责自动驾驶相 关课程研发、论文辅导业务开发、硬件研发; 主要方向 如果您是大模型/多模态大模型、扩散模型、VLA、端到端、具身交互、联合预测、SLAM、3D目标检测、 世界模型、闭环仿真3DGS、大模型部署与量化感知推理等方向,欢迎加入我们; 岗位要求 丰厚的现金激励; 创业项目合作与推荐; 联系我们 更多欢迎添加微信咨询,备注" 机构/公司 + 自动驾驶合作咨询 "。 ...
8B硬刚72B!MiniCPM-V 4.5技术报告正式出炉
量子位· 2025-09-23 11:01
Core Viewpoint - The technical report on MiniCPM-V 4.5, the industry's first multimodal model with high-refresh video understanding capabilities, has been officially released, showcasing significant advancements in video and document processing technologies [1][2]. Group 1: Technical Innovations - MiniCPM-V 4.5 introduces three key technologies: a unified 3D-Resampler architecture for high-density video compression, a unified OCR and knowledge learning paradigm for document processing, and a controllable hybrid fast/deep thinking multimodal reinforcement learning approach [2][8]. - The 3D-Resampler architecture achieves a remarkable 96x compression rate for visual tokens, allowing the model to process more video frames without increasing computational costs [11][12]. - The unified OCR and knowledge learning paradigm eliminates reliance on external parsing tools, significantly reducing data noise and engineering complexity, leading to superior performance in document understanding tasks [25][24]. Group 2: Model Performance - MiniCPM-V 4.5 has received widespread acclaim upon its open-source release, ranking second on HuggingFace's trending list, with over 220,000 downloads across major platforms [3][4]. - The model outperforms other leading models, including GPT-4o-latest and Qwen2.5-VL-72B, achieving state-of-the-art (SOTA) performance in various tasks while maintaining a parameter size of only 8 billion [34][36]. - In the OpenCompass evaluation, MiniCPM-V 4.5 achieved an average score of 77.0, demonstrating its superior visual language capabilities compared to other models in its class [34][36]. Group 3: Efficiency and Cost Reduction - The model's design allows for a significant reduction in training costs, with a 30% decrease in sampling expenses while maintaining high performance across both fast and deep thinking modes [29][30]. - The 3D-Resampler architecture not only enhances video processing efficiency but also ensures seamless knowledge transfer between image and video tasks, further optimizing resource utilization [11][12][14]. - The hybrid reinforcement learning approach balances the need for quick responses in everyday scenarios with the depth required for complex tasks, enhancing overall model reliability [27][32]. Group 4: Community and Recognition - The MiniCPM series, developed by Tsinghua University's NLP lab and Wanbi Intelligence, has gained significant academic and industrial recognition, with over 13 million downloads and numerous accolades [49]. - The model's contributions to the field have been acknowledged in prestigious publications and forums, highlighting its impact on multimodal AI research [49].
阿里一夜扔出三个开源王炸,猛刷32项开源SOTA
3 6 Ke· 2025-09-23 09:06
Qwen3-Omni能无缝处理文本、图像、音频和视频等多种输入形式,并通过实时流式响应同时生成文本与自然语音输出。其在36项音频及音视频基准测试 中斩获32项开源SOTA与22项总体SOTA,超越Gemini-2.5-Pro、Seed-ASR、GPT-4o-Transcribe等闭源强模型,同时其图像和文本性能也在同尺寸模型中达 到SOTA水平。 Qwen3-TTS支持17种音色与10种语言,在语音稳定性与音色相似度评估中超越SeedTTS、GPT-4o-Audio-Preview等主流产品。 Qwen-Image-Edit-2509的首要更新是支持多图编辑,可以拼接不同图片中的人物+人物、人物+物体等。 阿里开源主页 阿里开源了Qwen3-Omni-30B-A3B-Instruct(指令跟随)、Qwen3-Omni-30B-A3B-Thinking(推理)和通用音频字幕器Qwen3-Omni-30B-A3B-Captioner。 智东西9月23日消息,深夜,阿里通义大模型团队连放三个大招:开源原生全模态大模型Qwen3-Omni、语音生成模型Qwen3-TTS、图像编辑模型Qwen- Image-Edit- ...
光模块再冲锋,中际旭创涨超4%!英伟达拟向OpenAI投资至多1000亿美元!云计算ETF汇添富(159273)一度大涨超2%!
Xin Lang Cai Jing· 2025-09-23 02:41
Group 1 - The core viewpoint of the news highlights a significant surge in the computing power sector, driven by overseas news and strategic partnerships, particularly between Nvidia and OpenAI [1][3] - Nvidia and OpenAI have announced a strategic collaboration to build and deploy at least 10 gigawatts of AI data centers, utilizing millions of Nvidia GPUs, with Nvidia potentially investing up to $100 billion [3] - The cloud computing ETF, Huatai-PineBridge (159273), has seen a net inflow of over 700 million yuan in the past 20 days, indicating strong investor interest [1] Group 2 - The optical module sector is experiencing a boom due to rapid iterations of Nvidia GPUs and self-developed ASICs, leading to a doubling of bandwidth capacity with each generation [5] - The market recognizes a conversion ratio of GPU to optical modules at 1:2.5, with potential future ratios reaching 1:11.5 in certain applications [5] - The demand for computing power is driving significant capital expenditures among global cloud providers, with a projected 50% increase in capital spending to $333.8 billion by 2025 [6] Group 3 - The expansion of computing clusters, referred to as "ten thousand card clusters," is seen as a ticket to participate in the current model competition, with major operators and internet companies increasing their investments [7] - The cloud computing ETF Huatai-PineBridge (159273) aims to capture the growth opportunities in AI-driven cloud computing, covering a wide range of sectors including hardware, cloud services, and data center operations [7]