Workflow
语言
icon
Search documents
提效10倍,AI颠覆软件开发,这五条经验是关键分水岭
3 6 Ke· 2025-07-04 02:15
Core Insights - AI tools are accelerating the software development process while exposing significant capability gaps among different teams, leading to output differences of up to tenfold or more [1] - The concept of "AI-native development" requires a complete redesign of the development system, integrating AI at every stage from prototyping to deployment [1] - The conversation with Cedric Ith, founder of Perceptron AI, highlights the need for developers to collaborate effectively with AI, focusing on what successful teams do right [1][2] Group 1: Key Experiences from Cedric - Taste is the new competitive advantage, shifting focus from technical skills to design thinking and product intuition in an era where AI can generate code rapidly [3] - The ability to ask precise questions and create delightful user experiences is becoming the new barrier to entry in software development [3] - AI is redefining the design process, allowing designers to explore numerous concepts quickly and generate user-centric solutions [3] Group 2: New Design Paradigms - Natural language is emerging as a primary design interface, shifting the designer's role from creating visuals to articulating product structure through language [4][5] - Designers are developing a "design vocabulary" to communicate effectively with AI, enabling rapid prototyping that previously took engineers days to complete [5][6] - The ability to break down complex requests into clear, executable language is becoming essential for effective collaboration with AI [6] Group 3: The Rise of Design Engineers - The traditional boundary between design and engineering is dissolving, with designers now able to contribute directly to code and manage the entire tech stack [7][8] - This shift enhances efficiency and redefines product manufacturing, as designers gain control over the entire delivery process [8][9] - The iterative speed of design and development has significantly increased, compressing the time between design reviews and implementation from days to hours [10] Group 4: AI-Native Design Principles - Key principles for AI product design include reducing cognitive load, accepting non-determinism, and ensuring transparency in AI reasoning processes [11][12][13] - The design focus is shifting from user execution to user orchestration, requiring designs that facilitate coordination among multiple intelligent agents [14] - Teams adopting these principles early will create more intuitive and trustworthy AI experiences [14] Group 5: Organizational Adaptation in the AI Era - Organizations must transition from building perfect products to creating rapid learning organizations to keep pace with the fast-evolving AI landscape [15][16] - Cedric emphasizes the importance of quickly producing high-fidelity prototypes to gain internal buy-in, making design a catalyst for organizational change [16] - The entire product development cycle is being compressed, leading to unprecedented innovation density [16] Group 6: Cedric's AI Design Stack - The design stack includes tools like Figma for visual design, v0 for dynamic behavior definition, and Cursor for code-level adjustments, facilitating seamless transitions between design and engineering [17] - Component libraries like Shadcn and Tailwind provide standard semantics for AI, reducing risks associated with hallucinations in code generation [17]
无损加速视觉语言模型推理!轻松剪掉视觉冗余Token|腾讯AI Lab
量子位· 2025-07-04 01:42
VScan团队 投稿 量子位 | 公众号 QbitAI 多图像、长视频、细粒度感知正在让大型视觉语言模型(LVLM)变得越来越聪明,但也越来越"吃不消": 视觉Token数量的激增所带来的推理成本暴涨,正逐渐成为多模态智能扩展的最大算力瓶颈。 为解决这个问题, 腾讯AI Lab联合CMU 提出全新解决方案 VScan 。 该方法聚焦于大规模视觉语言模型推理阶段的效率瓶颈,通过精妙的两阶段视觉token筛选机制,在几乎不损性能的前提下,实现高达2.91x 的推理加速。无需修改模型架构、无需重新训练, 兼容FlashAttention, VScan为业界提供了一种轻量、通用、即插即用的推理加速方案。 为了处理更复杂、更丰富的视觉输入,现有LVLM往往需要编码远超文本Token规模的视觉信息。例如,LLaVA-NeXT在处理高分辨率图像时 会引入多达2,880个视觉Token,而Qwen2.5-VL在应对多图像或视频输入时,甚至能处理高达16,384个视觉Token——这一规模已远远超过 传统语言模型所处理的输入长度。 随着Token数量的激增,输入序列随之拉长,而自注意力机制的计算复杂度呈平方增长,这使得推理阶段 ...
AI眼镜行业深度解读:万亿市场如何掘金?
2025-07-03 15:28
AI 眼镜行业深度解读:万亿市场如何掘金?20250703 摘要 AI 眼镜作为 AI 大模型落地移动终端硬件的载体,具备稀缺性和成长性, 其核心功能包括替代蓝牙耳机、运动相机,甚至可能替代智能手机,市 场潜力巨大。 Meta 与雷朋联合推出的雷朋 Meta 是全球爆款产品,2024 年销量达 142 万台,其成功在于外观与普通眼镜无异,但增加了 AI 交互功能,且 性价比突出,延迟可接受,续航较长。 市场规模测算显示,AI 眼镜在音频、运动相机和 AR 显示替代方面分别 有 1,700 亿、300 亿和 1.8 万亿元人民币的潜力,远期市场空间巨大, 预计未来三至五年全球出货量可达 14 亿台。 2024 年 AI 眼镜全球销量约为 152 万副,渗透率仅为 0.3%,但 2025 年第一季度销量同比增长 82%,IDC 预计 2025 年销量将达 1,500 万台, 渗透率达 3.1%,市场处于导入初期,增长迅速。 AI 眼镜产业的核心驱动因素包括科技巨头入局带来的资金涌入、技术迭 代(如 Deepseek 模型和 Micro LED 显示)、成本降低以及爆品效应, 国产化路线通过零部件替代进一步降低成本 ...
中美AI差距有多大,AI竞争焦点在哪?《全球人工智能科研态势报告》全球首发
Tai Mei Ti A P P· 2025-07-03 10:36
Core Insights - The report titled "Global AI Research Landscape Report (2015-2024)" analyzes the evolution of AI research over the past decade, highlighting the competitive landscape between China and the United States in AI talent and publication output [2][7]. Group 1: AI Research Trends - The report identifies four distinct phases in AI research: initial phase (2015-2016), rapid development phase (2017-2019), maturity peak phase (2020-2023), and adjustment phase (2024) [4][5]. - The number of AI papers published globally increased significantly, with a peak of 17,074 papers in 2023, representing nearly a fourfold increase from 2015 [5][6]. - The year 2024 is expected to see a decline in publication volume to 14,786 papers, indicating a shift towards more specialized and application-oriented research [6]. Group 2: Talent Distribution - China has emerged as the second-largest hub for AI talent, with a total of 52,000 researchers by 2024, growing at a compound annual growth rate of 28.7% since 2015 [8]. - The United States leads with over 63,000 AI researchers, with significant contributions from institutions like Stanford and MIT, as well as tech giants like Google and Microsoft [8][9]. - Chinese institutions such as the Chinese Academy of Sciences, Tsinghua University, and Peking University are leading in terms of publication output and talent concentration [7][9]. Group 3: Institutional and Corporate Performance - The Chinese Academy of Sciences published 4,639 top-tier papers, while Tsinghua University and Peking University followed closely, showcasing China's institutional strength in AI research [7][9]. - In contrast, U.S. companies like Google, Microsoft, and Meta have a significantly higher average publication output compared to their Chinese counterparts, reflecting a disparity in research investment and output capabilities [9][10]. - The top three U.S. companies published 5,896 papers, which is 1.8 times the output of the top three Chinese companies [9][10]. Group 4: Gender Disparity in AI Talent - The report highlights a significant gender imbalance in AI research, with women making up only 9.3% of AI talent in China compared to 20.1% in the U.S. [12][13]. - Chinese institutions like Tsinghua University and Peking University have low female representation in AI, at 7.88% and 9.18% respectively, compared to 25%-30% in top U.S. institutions [12][13]. Group 5: Future Trends in AI Research - The report indicates that "deep learning" has been the dominant focus in AI research over the past decade, but its growth rate is expected to slow down, suggesting a need for new approaches [14][15]. - Emerging technologies such as "Transformers" are gaining traction, particularly in natural language processing and multimodal AI, indicating a shift in research focus [15]. - The integration of traditional AI fields with deep learning techniques is becoming more prevalent, reflecting a trend towards collaborative and interdisciplinary research [15].
vivo突破手机AI部署难题,绕开MoE架构限制,骁龙8 Elite流畅运行|ICCV 2025
量子位· 2025-07-03 09:00
GenieBlue团队 投稿 量子位 | 公众号 QbitAI 在AI迈入多模态时代的当下, "让大模型上手机" 成为产业落地的焦点。 现有MLLM在手机端部署时常面临两大难题: vivo AI研究院联合港中文以及上交团队 为了攻克这些难题, 从训练数据和模型结构两方面,系统性地分析了如何在MLLM训练中维持纯语言 能力,并基于此提出了GenieBlue——专为移动端手机NPU设计的高效MLLM结构方案。目前已被ICCV 2025接收。 主要贡献和技术亮点 1、现有端侧LLM在支持多模态功能后,纯语言任务准确率下降超10%。GenieBlue通过冻结原始LLM参数,并引入复制的Transformer层和 轻量化的LoRA模块,在多模态训练的过程中保留原始的语言能力。 2、通过大规模微调,GenieBlue达到与主流MLLM相媲美的多模态能力,并完全保留原始纯语言性能。 3、避开当前NPU不支持的MoE架构,采用不共享基座的推理策略。在搭载高通骁龙8 Elite(第四代)芯片的手机上实现流畅运行。 技术背景 1、当前的端侧MLLM无法取得令人满意的纯语言能力 在MATH(客观难题)、AlignBench和MT- ...
全球首个自动驾驶VLA综述重磅发布:VLA自驾模型全面拆解~
具身智能之心· 2025-07-03 08:22
自动驾驶开发者社区,关注自动驾驶、计算机视觉、感知融合、BEV、部署落地、定位规控、领域方案等,坚持为领域输出最前沿的技术方向! 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 麦吉尔大学、清华大学、小米公司 和威斯康辛麦迪逊的研究团队 最新的工作! 面向自动驾驶的视觉-语言-动作模 型综述! 如果您有相关工作需要分享,请在文末联系我们! 以下文章来源于自动驾驶之心 ,作者Sicong Jiang等 自动驾驶之心 . 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 当视觉(Vision)、语言(Language)和行动(Action)三大能力在一个模型中融合,自动驾驶的未来将走向何方? 近日,来自麦吉尔大学、清华大学、小米公司和威斯康辛麦迪逊的研究团队联合发布了全球首篇针对自动驾驶领域的视觉-语言-行动(Vision-Language-Action, VLA)模型的全面综述。这篇题为《A Survey on Vision-Languag ...
首次!世界模型、动作模型融合,全自回归模型WorldVLA来了
机器之心· 2025-07-03 08:01
Core Viewpoint - Alibaba's Damo Academy has introduced WorldVLA, a model that integrates World Model and Action Model into a unified autoregressive framework, enhancing understanding and generation across text, images, and actions [1][4]. Summary by Sections Research Overview - The development of Vision-Language-Action (VLA) models has become a significant focus in robotic action modeling, typically built on large-scale pretrained multimodal language models (MLLMs) with added action output capabilities [4]. - Existing VLA models often lack a deep understanding of actions, treating them merely as output rather than analyzing them as input [5]. Model Description - WorldVLA addresses the limitations of both VLA and World Models by using a unified autoregressive mechanism for action and image understanding and generation [5][10]. - It employs three independent encoders for processing images, text, and action data, sharing the same vocabulary to facilitate cross-modal tasks [12]. Mechanism and Strategy - The World Model component generates visual representations based on input actions, learning the physical dynamics of the environment, while the Action Model enhances visual understanding [7]. - An action attention masking strategy is introduced to mitigate error accumulation during the generation of multiple actions, significantly improving performance in action chunking tasks [8][14]. Experimental Results - In the LIBERO benchmark, WorldVLA achieved a 4% improvement in grasp success rate compared to traditional action models and a 10% reduction in Fréchet Video Distance (FVD) compared to traditional world models [8]. - The introduction of the attention mask strategy led to a performance improvement in grasp success rates ranging from 4% to 23% in action chunking tasks [8]. Comparative Analysis - WorldVLA outperformed other models in various metrics, demonstrating its effectiveness in integrating action and world modeling [18]. - The model's ability to generate the next frame based on actions and images showcases its advanced capabilities in visual prediction [24].
丰富表达彰显汉语魅力
Ren Min Ri Bao· 2025-07-03 00:31
也要看到,"语言是存在之家",是每个人的文化之根。我们要培养开放的胸怀,也需练就正确使用语言 的自觉。避免曲解美好的语言意象,拒绝使用黑话烂梗,既是如何使用语言、更好表达的问题,也是如 何对待文化、更好生活的问题。在创新与守正的辩证中去丰富表达、感悟文化,语言的大河就会更加澄 澈明净、更加充满活力。 《 人民日报 》( 2025年07月03日 05 版) (责编:胡永秋、杨光宇) 身边一名画家以书画扇面赠朋友,朋友却不悦,认为扇有"散"的谐音,不吉利。其实,人际交往中赠文 化扇或雨伞,本是一番好意,因谐音而感到扫兴,大可不必。 多样的表达折射出多重情感,反映着多彩生活。"青青之竹形兆直,妙华长竿纷实翼""竹伞遮云径,藤 鞋踏藓矼"……古人诗词中,扇和伞两个意象表达的多是正面意涵。其实,中国人自古就有用谐音表达 吉祥的传统。比如"羊"谐音"祥",扇子有"善解人意"的寓意,猫和蝴蝶谐音"耄耋",有祝人长寿之意。 丰富的表达,是汉语文化博大精深的生动写照。 语言是一条有生命力的河流,总有新的含义和表达汇入,它才保持着旺盛的活力,拥有了更好刻画时代 的能力。如今,一些有新意的"网言网语"广为流传,说明创新创造正赋予汉 ...
全球首个自动驾驶VLA综述重磅发布:VLA自驾模型全面拆解(麦吉尔&清华等)
自动驾驶之心· 2025-07-02 13:54
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 麦吉尔大学、清华大学、小米公司 和威斯康辛麦迪 逊的研究团队 最新的工作! 面向自动驾驶的视觉-语言-动作模型综述! 如果您有 相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一 步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Sicong Jiang等 编辑 | 自动驾驶之心 "自动驾驶未来已来?" 当视觉(Vision)、语言(Language)和行动(Action)三大能力在一个模型中融合,自动驾驶的未来将走 向何方? 近日,来自麦吉尔大学、清华大学、小米公司和威斯康辛麦迪逊的研究团队联合发布了全球首篇针对自动 驾驶领域的视觉-语言-行动(Vision-Language-Action, VLA)模型的全面综述。这篇题为《A Survey on Vision-Language-Action Models for Autonomous Driving 》 的 论 文 , 系 统 性 地 ...
VQ-VLA:大规模合成数据驱动动作tokenizer,推理速度提升近三倍
具身智能之心· 2025-07-02 10:18
1. 动作表示效率低 :传统连续动作离散化方法(如均匀分桶)难以捕捉复杂时空动态,导致长时域任务 中累积误差增大 2. 数据依赖瓶颈 :真实机器人数据采集成本高,限制模型泛化能力 点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Yating Wang等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 背景 视觉-语言-动作模型(VLA) 在多模态机器人控制中面临两大挑战: 核心贡献 通用动作分词器框架 :提出基于卷积残差VQ-VAE的通用动作分词器框架,替代传统分桶离散化方法。 合成数据驱动缩放 :首次证明动作轨迹的合成-真实域差异极小( Table 3 显示纯合成数据训练的VQ 在真实任务接近混合数据性能),利用超大规模合成数据(100倍于先前工作)训练分词器。 性能全面优化 :显著提升VLA模型的三项核心指标: 成功率 :长时域任务成功率最高提升30%(真实机器人实验 Figure 3 ) 关键技术方案 1. 卷积残差VQ-VA ...