Workflow
视觉
icon
Search documents
英伟达港大联手革新视觉注意力机制!GSPN高分辨率生成加速超84倍
量子位· 2025-06-10 05:16
GSPN团队 投稿 量子位 | 公众号 QbitAI 二维线性传播:从行列并行到密集连接 视觉注意力机制 ,又有新突破,来自香港大学和英伟达。 Transformer的自注意力在NLP和计算机视觉领域表现出色——它能捕捉远距离依赖,构建深度上下文。然而,面对高分辨率图像时,传统自 注意力有两个大难题: 虽然线性注意力和Mamba等方法能把复杂度降到O(N),但它们还是把图像当作一维序列处理,无法真正利用二维空间信息。 为此,香港大学与英伟达联合推出了 广义空间传播网络(GSPN) 。 GSPN采用二维线性传播,结合"稳定性–上下文条件",将计算量从 O(N²) 或 O(N) 再降到√N级别,并完整保留图像的空间连贯性。这样,不 仅大幅提升了效率,还在多个视觉任务上刷新了性能纪录。 兼具空间连贯性和计算效率 GSPN的核心技术是 二维线性传播 与 稳定性-上下文条件 ,基于此,现有注意力机制与GSPN的对比如下: 作为GSPN的核心组件,二维线性传播包括两个关键点: 线扫描机制 对于二维图像,二维线性传播通过逐行或逐列的顺序处理进行其遵循线性循环过程,隐藏层通过前一行的隐藏状态和当前输入计算得出: 计算量巨大: ...
乐动机器人冲刺H股:机器人激光雷达开启“三国杀”
随着消费级机器人不断放量带动机器视觉配套需求增加,乐动机器人营业收入在2024年实现了快速增长 背靠机器人头部企业 招股书显示,根据灼识咨询,乐动机器人为专注视觉感知技术为核心的智能机器人公司。2024年,公司 合计赋能智能机器人600万台。其中,DTOF(直接飞行时间法,一种激光雷达的技术路线)激光雷达 出货量72万台,为全球第一。 公司产品覆盖全球50个国家终端用户,服务下游客户300家,其中包括全球十大服务机器人中的七家, 以及全球五大商用机器人企业。 国内两大清扫机器人品牌科沃斯(603486)、云鲸也均为乐动机器人客户。 虽然名为"机器人",但乐动机器人业务主要是以激光雷达模组销售为主,是一家以"视觉感知技术"为核 心的机器人企业。 与禾赛、速腾聚创通过车载激光雷达为人熟知不同,乐动机器人的激光雷达产品主要聚焦家用、商用、 工业等机器人运用场景。而相比车载激光雷达,家用机器人激光雷达对远距离探测深度的要求不高,但 对近距离探测精度有更高的要求。这个产品特点也决定了,其比车载激光雷达利润更低,但量能更大。 随着消费级机器人不断放量带动机器视觉配套需求增加,乐动机器人营业收入在2024年实现了快速增 长 ...
首创像素空间推理,7B模型领先GPT-4o,让VLM能像人类一样「眼脑并用」
量子位· 2025-06-09 09:27
Core Viewpoint - The article discusses the transition of Visual Language Models (VLM) from "perception" to "cognition," highlighting the introduction of "Pixel-Space Reasoning" which allows models to interact with visual information directly at the pixel level, enhancing their understanding and reasoning capabilities [1][2][3]. Group 1: Key Developments in VLM - The current mainstream VLMs are limited by their reliance on text tokens, which can lead to loss of critical information in high-resolution images and dynamic video scenes [2][4]. - "Pixel-Space Reasoning" enables models to perform visual operations directly, allowing for a more human-like interaction with visual data [3][6]. - This new reasoning paradigm shifts the focus from text-mediated understanding to native visual operations, enhancing the model's ability to capture spatial relationships and dynamic details [6][7]. Group 2: Overcoming Learning Challenges - The research team identified a "cognitive inertia" challenge where the model's established text reasoning capabilities hinder the development of new pixel operation skills, creating a "learning trap" [8][9]. - To address this, a reinforcement learning framework was designed that combines intrinsic curiosity incentives with extrinsic correctness rewards, encouraging the model to explore visual operations [9][12]. - The framework includes constraints to ensure a minimum rate of pixel-space reasoning and to balance exploration with computational efficiency [10][11]. Group 3: Performance Validation - The Pixel-Reasoner, based on the Qwen2.5-VL-7B model, achieved impressive results across four visual reasoning benchmarks, outperforming models like GPT-4o and Gemini-2.5-Pro [13][19]. - Specifically, it achieved an accuracy of 84.3% on the V* Bench, significantly higher than its competitors [13]. - The model demonstrated a 73.8% accuracy on TallyQA-Complex, showcasing its ability to differentiate between similar objects in images [19][20]. Group 4: Future Implications - The research indicates that pixel-space reasoning is not a replacement for text reasoning but rather a complementary pathway for VLMs, enabling a dual-track understanding of the world [21]. - As multi-modal reasoning capabilities evolve, the industry is moving towards a future where machines can "see more clearly and think more deeply" [21].
凝视的革命:《戴珍珠耳环的少女》的艺术史坐标
Jing Ji Guan Cha Bao· 2025-06-09 06:02
(原标题:凝视的革命:《戴珍珠耳环的少女》的艺术史坐标) 在荷兰代尔夫特的黄金时代余晖中,约翰内斯?维米尔的《戴珍珠耳环的少女》(1665)如同一颗嵌入 艺术史长河的神秘珍珠,其微光穿透三个半世纪,持续改写着人类对肖像画的认知范式。这幅看似日常 的少女肖像,实则是 17 世纪荷兰艺术转型的缩影,承载着从宗教神坛到世俗人间的视觉革命,在巴洛 克的戏剧性与启蒙时代的理性化之间,开辟了一条凝视的新维度。 维米尔工作室遗留的透镜、暗箱装置残片,揭示了他作为 "光影工程师" 的探索。画中光线从左侧斜切 而入,在少女右脸颊形成精确的明暗交界线,这种光影分布遵循光学定律,与卡拉瓦乔的戏剧性明暗对 照法不同,更趋近自然主义的视觉真实。X 射线扫描显示,维米尔在打底稿时使用了暗箱投影技术,少 女鼻梁的阴影角度与暗箱成像的几何计算高度吻合。色彩层面,他构建了精密的视觉色谱:头巾的群青 与珍珠的金属反光形成互补关系,颈部肉色中调入极细的群青颗粒,在特定光线下呈现微妙冷调 —— 这种对光影与色彩的细腻处理,体现了 17 世纪科学革命在艺术领域的回响。 少女侧转的头颅与回眸的眼神,构成艺术史上罕见的凝视对峙。在传统肖像画中,被画者眼神 ...
85后天大老师创业,净利润暴增295%,冲刺IPO
创业邦· 2025-06-09 02:58
「IPO全观察」 栏目聚焦首次公开募股公司,报道企业家创业经历与成功故事,剖析公司商业模式和 经营业绩,并揭秘VC、CVC等各方资本力量对公司的投资加持。 作者丨薛皓皓 编辑丨关雎 图源丨Midjourney 近日,机器视觉龙头企业易思维(杭州)科技股份有限公司(下称"易思维")提交科创板招股书,这 是今年受理的第四家科创板 IPO 企业。 易思维是一家深耕于交通领域的机器视觉设备提供商,产品包括了机器视觉产业链的上中下游。 它涵盖上游视觉传感器、光源、控制电器等核心部件,中游测量系统、引导系统、检测系统和识别系 统的机器视觉系统,下游汽车制造、轨道交通和航空的应用集成方案。 根据弗若斯特沙利文的统计,公司在 中国汽车制造 和汽车整车制造的机器视觉市场的市占率,分别 为 13.7% 和 22.5% ,均位居行业第一。 从天大讲师到自主创业,如今做到行业第一 现年 39 岁的郭寅在创立易思维之前,一直致力于精密仪器的学术研究。 郭寅(中间), 图源:官方公众号 易思维的业务覆盖范围, 图源:招股书 2010 年,郭寅进入天津大学仪器科学与技术专业攻读博士学位。此后,他进一步精进在精密仪器方 面的学术水平,便加入 ...
计算机之“眼”研究迈出重要一步 人工突触成功模仿人类彩色视觉
Ke Ji Ri Bao· 2025-06-08 23:24
Core Insights - A team from Tokyo University of Science has developed a self-powered artificial synapse with high color discrimination ability, closely approaching human eye capabilities, marking a significant advancement in machine vision research [1][2] - The rapid development of artificial intelligence has increased the demands on machine vision, which traditionally consumes substantial power, storage, and computational resources [1] - The new artificial synapse system integrates two types of dye-sensitized solar cells, enabling it to convert solar energy directly for power, making it suitable for energy-efficient edge computing applications [2] Group 1 - The artificial synapse can distinguish colors within the visible spectrum at a resolution of 10 nanometers, comparable to human vision [2] - It exhibits bipolar response characteristics, generating positive voltage under blue light and negative voltage under red light, allowing it to perform complex logical operations typically requiring multiple traditional optoelectronic components [2] - The system demonstrated an 82% accuracy rate in classifying up to 18 combinations of colors and human actions using a single device [2] Group 2 - This technology has the potential to provide human-like vision capabilities to everyday devices, with broad application prospects in autonomous driving, medical health devices, and consumer electronics [2]
拆解特斯拉机器人供应链:30 多位从业者看到的泡沫和希望
晚一点,好一点 以下文章来源于晚点LatePost ,作者晚点团队 晚点LatePost . 作者 | 李梓楠 来源 | 晚点LatePost 导语 :重新发明了汽车,但还没造出可用的轮子。 今年 4 月中旬,特斯拉采购团队来到宁波一家供应商的厂区,做人形机器人量产前的最后一次审厂。门口一辆车上,盯梢的人对上了车 牌,拍下照片发给 "上线":"特斯拉来审厂了。" 值得这么麻烦。第二个交易日,这家公司股价照例涨停。从特斯拉 2022 年 10 月第一次对外展示人形机器人至今,A 股机器人概念板块 涨了 93%,同期沪深 300 指数只上涨约 1%。 一周后,数千个组装完成的核心零部件在宁波装船,顶着高昂关税,发往美国加州弗里蒙特的特斯拉工厂。 这里没有一点万亿概念板块的样子。弗里蒙特工厂二楼的机器人制造专区,没有手臂和脑袋的机器人系着铁链,挂在架子上。工程师测试 完零件后,会把它们手工拼装成新款人形机器人。地面上散落着电线和塑料包装。 自特斯拉 2022 年亮相机器人后,全球的风险投资者、特斯拉及供应商已为此投入超过 1000 亿元。到目前为止,人形机器人的生产比劳 力士机械表还要手作。据我们了解,特斯拉下的零 ...
乐动机器人:2024 年净亏 0.56 亿,割草机业务待突围
He Xun Cai Jing· 2025-06-08 11:31
Group 1 - The core viewpoint of the article highlights the rapid capitalization of the robotics industry, with Shenzhen Ledong Robotics Co., Ltd. filing for an IPO in Hong Kong, presenting both challenges and opportunities [1] - Ledong Robotics focuses on visual perception products, particularly laser radar, projecting a revenue of 341 million yuan in 2024, accounting for over 70% of its total revenue [1] - The company is recognized as the largest intelligent robotics company globally based on visual perception technology, with over 6 million smart robots equipped with its technology in the same year [1] Group 2 - Despite its leading position, Ledong Robotics faces challenges due to low profit margins, which are less than one-third of its competitor, A-share company Obsidian [1] - To address this, the company is increasing its focus on smart lawnmower robots, having sold over 15,000 units in 2025 so far, surpassing the total sales for 2024 [1] - The smart lawnmower market is projected to reach a penetration rate of 17% and a market size of approximately 47.6 billion yuan by 2029, indicating significant growth potential [1] Group 3 - Ledong Robotics' visual perception products generated a revenue of 341 million yuan in 2024, with a year-on-year increase of over 100%, but the gross margin is only 15.2%, significantly lower than its peers [1] - The gross margin for the algorithm module, a relatively higher-margin product, reached 31.3% in 2024, but revenue declined by over 20% compared to 2022 due to price drops and increased in-house development by end-user companies [1] - The overall gross margin for Ledong Robotics is expected to drop to 19.5% in 2024, reflecting the competitive pressures in the market [1] Group 4 - Ledong Robotics reported a net loss of 56 million yuan in 2024, and the upcoming IPO aims to raise funds to upgrade its lawnmower robot products, which is a focal point for market observers [1]
阿里CEO押注、主攻传感器的乐动机器人,港股IPO募资去“割草”
Hua Er Jie Jian Wen· 2025-06-08 08:09
Core Viewpoint - The rapid capitalization of robotics companies is being driven by the growing interest in the sector, with Shenzhen Ledong Robotics Co., Ltd. filing for an IPO to enhance its market position and product offerings [1][2]. Group 1: Company Overview - Ledong Robotics focuses on visual perception products, particularly laser radar, which is expected to generate revenue of 341 million yuan in 2024, accounting for over 70% of its total revenue [2][11]. - The company is positioned as the largest intelligent robotics company globally based on visual perception technology, with over 6 million units equipped with its technology expected to be sold in 2024 [3]. - Despite its leading position, Ledong Robotics has a gross margin of only 15.2%, significantly lower than competitors like Orbbec and Aoptix, which have margins exceeding 60% [12][20]. Group 2: Product and Market Challenges - Ledong Robotics' visual perception products include a comprehensive range of sensors and algorithms, but the company faces challenges in maintaining pricing power due to its position as a 2B supplier [10][12]. - The company has acknowledged that it has had to lower prices to maintain market share in the home robotics sector, indicating a lack of bargaining power [14][15]. - The algorithm module segment, which has a higher gross margin of 31.3%, is experiencing a decline in revenue, attributed to price drops and increased competition from terminal robotics companies developing their own algorithms [16][18][19]. Group 3: New Business Ventures - In response to challenges in its B2B business, Ledong Robotics is entering the consumer market with its own line of intelligent lawn mowers, which it believes has significant growth potential in mature markets [22][23]. - The company launched its first intelligent lawn mower in 2024, achieving sales of over 10,000 units and generating 2.3 million yuan in revenue [24]. - The lawn mower segment boasts a gross margin of 33.6%, which could help stabilize the overall gross margin of the company [26]. Group 4: Competitive Landscape - The intelligent lawn mower market is becoming increasingly competitive, with established players like Ninebot and Roborock entering the space, posing additional challenges for Ledong Robotics [29][31][32]. - Despite the competitive pressures, Ledong Robotics is currently operating at a net loss of 56 million yuan in 2024, highlighting the financial challenges it faces [33]. - The upcoming IPO is expected to provide necessary funding to enhance its product offerings and improve market competitiveness [34].
多模态模型挑战北京杭州地铁图!o3成绩显著,但跟人类有差距
量子位· 2025-06-07 05:02
ReasonMap团队 投稿 量子位 | 公众号 QbitAI 近年来,大语言模型(LLMs)以及多模态大模型(MLLMs)在多种场景理解和复杂推理任务中取得突破性进展。 然而,一个关键问题仍然值得追问: 多模态大模型(MLLMs),真的能"看懂图"了吗? 特别是在面对结构复杂、细节密集的图像时,它们是否具备细粒度视觉理解与空间推理能力,比如挑战一下高清 地铁图 这种。 为此,来自西湖大学、新加坡国立大学、浙江大学、华中科技大学的团队提出了一个全新的评测基准 ReasonMap 。 看得出来北京、杭州的地铁图难倒了一大片模型。 这是首个聚焦于 高分辨率交通图(主要为地铁图)的多模态推理评测基准,专为评估大模型在理解图像中细粒度的结构化空间信息 方面的 能力而设计。 结果发现,当前主流开源的多模态模型在ReasonMap上面临明显性能瓶颈,尤其在 跨线路路径规划 上常出现视觉混淆或站点遗漏。 而经强化学习后训练的闭源推理模型(如 GPT-o3)在多个维度上 显著优于 现有开源模型,但与人类水平相比仍存在明显差距。 在面对不同国家地区的地铁图中,四个代表性 MLLM(Qwen2.5-VL-72B-I(蓝色)、 I ...