Workflow
视觉语言动作模型
icon
Search documents
腾讯研究院AI速递 20250709
腾讯研究院· 2025-07-08 15:50
Group 1 - Ruoming Pang, head of Apple's foundational model team, is reported to join Meta's new AI team with an annual compensation in the tens of millions [1] - Pang's departure may be influenced by internal discussions at Apple regarding the introduction of third-party models like OpenAI, leading to team morale issues [1] - Apple's AI team structure will be reorganized under Zhifeng Chen, transitioning to a multi-layer management structure [1] Group 2 - Microsoft has launched Deep Research, a public preview version that utilizes the o3 model and Bing search to create an advanced AI research tool [2] - This AI can automatically deconstruct complex problems, gather the latest authoritative information from the web, and generate auditable research reports [2] - An API interface has been opened for integration into applications, supporting enterprise-level AI platforms across various fields such as research, finance, and healthcare [2] Group 3 - Alibaba has open-sourced the multi-modal reasoning model HumanOmniV2, capable of accurately capturing hidden information in videos and understanding "subtext" [3] - The model incorporates a forced context summarization mechanism, a multi-dimensional reward system driven by large models, and optimization training methods based on GRPO [3] - Alibaba has introduced the IntentBench evaluation benchmark, with HumanOmniV2 achieving an accuracy rate of 69.33%, excelling in understanding complex human intentions [3] Group 4 - PaddleOCR 3.1 has been released, with Wenxin 4.5 enhancing the accuracy of text recognition in 37 languages by over 30%, supporting high-quality automatic data labeling [4] - A new production line, PP-DocTranslation, has been added, combining PP-StructureV3 and Wenxin 4.5 to support translation of Markdown, PDF, and image documents, along with customization of professional terminology [4] Group 5 - A controversy has emerged involving hidden instructions in academic papers aimed at inducing AI to give high scores, with several top universities implicated [6] - Xie Saining, a co-author of one such paper, acknowledged responsibility and apologized, clarifying that he does not endorse such practices [6] - This incident has sparked discussions on academic ethics in the AI era, highlighting the lack of unified standards in AI review processes and the need for reform [6] Group 6 - The Visual Language Action model (VLA) is becoming a core technology for embodied intelligence by 2025, with rapid iterations from Google's RT-2 breakthrough [7] - China's Zhihui Square has partnered with top universities to launch FiS-VLA, innovatively embedding "fast systems" into "slow systems" to address the trade-off between robotic control efficiency and reasoning capability [7] - FiS-VLA has achieved an 8% success rate improvement in simulation tasks and an 11% improvement in real environments, with a control frequency of 21.9Hz, 1.6 times that of the open-source model π0 [7] Group 7 - YouTube co-founder Chen Shijun discussed AI entrepreneurship and long-termism with the Manus team, emphasizing the value of rapid experimentation and risk-taking [8] - Recommendations for AI startups include leveraging first-mover advantages to retain users, creating compound network effects, and exploring areas that larger companies avoid, all within legal boundaries [8] - Key decisions at YouTube included prioritizing user growth over immediate monetization, establishing transparent core metrics, and developing a creator-friendly advertising model while focusing on the "passive experience" of recommendation systems [8] Group 8 - The key shift in acquiring users for AI products is that if a product does not generate social engagement within the first 48 hours, it may fail, making virality a survival threshold rather than a bonus [9] - The success story of selling Base44 for $80 million involved user participation in the development process, encouraging sharing of creations, and strategically choosing LinkedIn as a platform for dissemination, creating a closed loop of development, showcasing, and sharing [9] - The distribution paradigm for AI startups is evolving, with product development becoming a public showcase, niche native creators proving more effective than influencers, and growth metrics becoming assets for dissemination, shifting from "closed-door development" to "public collaboration" [9] Group 9 - U.S. universities are reshaping computer science education, with the CS major potentially becoming more humanities-oriented, emphasizing computational thinking and AI literacy over traditional programming skills [10] - The "Level Up AI" initiative has launched an 18-month curriculum overhaul, where future programming languages may involve "Human," allowing students to complete programming tasks through interaction with AI [10] - Traditional humanities classrooms are facing assessment crises, with educators struggling to identify AI-generated content, leading to a return to handwritten assignments and the development of anti-cheating systems, raising concerns about students' over-reliance on AI affecting their cognitive abilities [10]
学习端到端大模型,还不太明白VLM和VLA的区别。。。
自动驾驶之心· 2025-06-19 11:54
以下是知识星球里面一位同学的提问: 请问VLA和VLM的区别是什么?现在推荐学哪个呢? 这两者互为表里: 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 大模型已经席卷各个领域,在智能驾驶领域,VLM也正在逐渐铺开落地量产。 不少入门的小伙伴也表示,现在大模型太重要了,想要入门学习,但有点拿不准方向。 1、VLM可以理解基础的能力,可以是通用的检测、问答、空间理解、思维链等等能力 2、VLA更侧重Action的能力,最终目的是为了做动作,在自动驾驶中可以理解为自车轨迹预测的能力,通时预 测的轨迹又要尽可能的符合人类的理解,这又进一步依赖vision和language的基本能力,比如我要解释这个行 为,可以使用思维链的形式一步步推理分析,这里面依赖自动驾驶基础的感知(行人在哪里,2D坐标,3D位置 等等) 这两者没办法完全独立的学习,我认为的方式应该是先学VLM,再去扩展到VLA VLM接扩散模型就可以预测轨迹,也就是Action,这块就涉及到多模轨迹的好处了,面对不确定的环境,单模 的能力有限,多模的上限是更高的 最后欢迎大家加入知识星球,硬核资料在星球置 ...
元戎启行VLA模型三季度要量产,能否冲破市场+技术壁垒?
Nan Fang Du Shi Bao· 2025-06-13 15:04
近日,在2025年火山引擎Force原动力大会上,自动驾驶公司元戎启行宣布其VLA模型将于2025年第三季度推向消费者市场,并预计在年内上车五款车 型。 在活动现场,元戎启行CEO周光高调展示了VLA模型的四大"超能力":透视眼式的盲区破解、百事通般的异形障碍物识别、翻译官级的路标解析、应 答灵式的语音控车,引发了业内浓厚的兴趣。 元戎启行并非智能驾驶领域的新兵。自 2018 年成立以来,这家总部位于深圳的高科技企业便在自动驾驶和车联网技术方面深耕细作。 此外,元戎启行在技术研发过程中也比较注重成本控制。以与高通的合作为例,通过技术优化,在100TOPS算力的骁龙SA8650 平台上实现了原本需 更高算力支撑的复杂场景运行,将智驾方案价格大大降低。 如何挤占出足够的市场空间? 在智能辅助驾驶领域,行业已经进入激烈的市场争夺阶段,众多方案商早已提前布局,纷纷与车企达成合作,拿下大量合作车型。 因此,准备入场的元戎启行,拿着今年第三季度才能推向市场的 VLA 模型时,不仅需要在短时间内突破市场壁垒,而且必须在华为、地平线、 Momenta等品牌的包围下,快速提升知名度和产品认可度,这无疑是一项艰巨的任务。 这家企 ...
拆解特斯拉机器人供应链:30 多位从业者看到的泡沫和希望
晚一点,好一点 以下文章来源于晚点LatePost ,作者晚点团队 晚点LatePost . 作者 | 李梓楠 来源 | 晚点LatePost 导语 :重新发明了汽车,但还没造出可用的轮子。 今年 4 月中旬,特斯拉采购团队来到宁波一家供应商的厂区,做人形机器人量产前的最后一次审厂。门口一辆车上,盯梢的人对上了车 牌,拍下照片发给 "上线":"特斯拉来审厂了。" 值得这么麻烦。第二个交易日,这家公司股价照例涨停。从特斯拉 2022 年 10 月第一次对外展示人形机器人至今,A 股机器人概念板块 涨了 93%,同期沪深 300 指数只上涨约 1%。 一周后,数千个组装完成的核心零部件在宁波装船,顶着高昂关税,发往美国加州弗里蒙特的特斯拉工厂。 这里没有一点万亿概念板块的样子。弗里蒙特工厂二楼的机器人制造专区,没有手臂和脑袋的机器人系着铁链,挂在架子上。工程师测试 完零件后,会把它们手工拼装成新款人形机器人。地面上散落着电线和塑料包装。 自特斯拉 2022 年亮相机器人后,全球的风险投资者、特斯拉及供应商已为此投入超过 1000 亿元。到目前为止,人形机器人的生产比劳 力士机械表还要手作。据我们了解,特斯拉下的零 ...
具身智能:一场需要谦逊与耐心的科学远征
Robot猎场备忘录· 2025-05-20 05:01
温馨提示 : 点击下方图片,查看运营团队2025年最新原创报告(共210页) 说明: 欢迎约稿、刊例合作、行业人士交流 , 行业交流记得先加入 "机器人头条"知识星球 ,后添加( 微信号:lietou100w ) 微信; 若有侵权、改稿请联系编辑运营(微信:li_sir_2020); 正文: 近日,南方科大助理教授周博宇在自己知乎账号" 周指导BoyuZhou "梳理了关于具身智能的若干思考,诸多观点 跟小编不谋而合,特转载分享给大家: 首先要肯定的是,具身智能确实为机器人领域注入了新的研究活力,有望突破机器人的性能上限。具身领域涌现 出众多令人钦佩的青年学者,这里就不一一respect了。 鉴于具身智能与机器人学 存在天然的关联,本文拟立足个人研究视角,以开放的态度分享来自机器人学科背景的 观察与思考。同时我也希望读者能保持开放的态度进行讨论, 尤其反对瞎带节奏,挑起Robotics与具身派别之 争。讨论的意义是如何促进科学发展,不是分出高低贵贱 。 一、学科无需"称王",科学本应共生 有观点认为,传统机器人学有相当一部分的研究重点在于"特别"的机器人或者"特别"的任务,这类"特殊任务研 究"对科学虽然有用 ...