Workflow
语言
icon
Search documents
死磕技术的自动驾驶黄埔军校,三周年了。。。
自动驾驶之心· 2025-07-19 03:04
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 当前自动驾驶技术正处于从辅助驾驶(L2/L3)向高阶无人驾驶(L4/L5)跨越的关键阶段。2025年自动驾驶、 具身智能、大模型Agent三大赛道是当下AI竞争的高地。如果你对自动驾驶有浓厚的兴趣,并且想找业内最专业 的大佬交流,那么这个圈子一定没错! 智驾的方向太多了,如果想从事端到端。我建议不用专门学习传统规控了,可以直接学习端到端,遇到传 统规控不理解的内容进一步查缺补漏就可以,建议从bev感知开始学习,了解原理,再到两段式端到端,一 段式端到端,再到大模型赋能下的端到端~ 前沿的更迭速度很快,有没有一个专业的技术社区一直follow学术界的前沿研究和工业界的量产落地?带着这个 想法,我们打造了『自动驾驶之心知识星球』。 我们为大家准备了大额新人优惠... 大额新人优惠!欢迎扫码加入~ 国内最大的自动驾驶学习社区 知识星球有哪些内容模块 带着对技术的思考,星球主要包含四大板块: 下面给大家分享下业内最前沿的四大技术方向,星球都汇总了哪些内容:视觉大语言模型、世界模型、扩散模型 和端到端自动驾驶。 前沿文 ...
百度集团-SW(09888):AI搜索改造下百度核心广告业务承压,萝卜快跑继续领跑Robotaxi行业
Investment Rating - The report maintains a "Buy" rating for Baidu Group [2][7] Core Views - Baidu's core advertising business is expected to face pressure due to AI search transformations, with a projected revenue decline of 16.3% year-on-year in Q2 2025 [7] - Baidu's Robotaxi service, "Luobo Kuaipao," is leading the global market, with a significant increase in order volume, up 75% year-on-year to 1.44 million in Q1 2025 [7] - The company's intelligent cloud business is experiencing rapid growth driven by the demand for generative AI and large language models, with Q1 2025 cloud service revenue expected to grow by 42% year-on-year [7] - The overall revenue forecast for 2025-2027 has been adjusted to reflect a decline of 5.2% in 2025, followed by growth of 4.4% and 4.8% in 2026 and 2027, respectively [7] - The target price for Baidu has been revised down to HKD 95.15 based on DCF valuation [7] Financial Projections - Revenue projections for Baidu are as follows: - 2024: 133,125 million CNY - 2025: 126,265 million CNY - 2026: 131,853 million CNY - 2027: 138,172 million CNY [2][12] - Net profit projections are: - 2024: 23,760 million CNY - 2025: 18,324 million CNY - 2026: 20,200 million CNY - 2027: 22,172 million CNY [2][12] - The P/E ratio is projected to be 10.9 in 2024, increasing to 14.1 in 2025, and then decreasing to 11.7 by 2027 [2][12] Business Segments - The core online marketing service revenue is expected to decline by 15.3% in 2025, while cloud service revenue is projected to grow by 22.2% [8] - The iQIYI segment is expected to see a slight revenue decline of 1.0% in 2025 [8] Valuation Metrics - The report provides a DCF valuation breakdown, indicating a total enterprise value of approximately 370.45 billion CNY, with equity value at 287.44 billion CNY [9][10]
AI Day直播 | LangCoop:自动驾驶首次以“人类语言”的范式思考
自动驾驶之心· 2025-07-18 10:32
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 论文链接 : https://www.arxiv.org/pdf/2504.13406 分享介绍 更更更更更多多多多多精精精精精彩彩彩彩彩回回回回回顾顾顾顾顾 直播精华看不够?完整版深度内容已独家上线知识星球「 自动驾驶之心 」!涵盖所有技术细节、 QA及未公开彩蛋。深度解析! >>直播和内容获取转到 → 自动驾驶之心知识星球 点击按钮预约直播 多智能体协作实现了多个互联智能体之间的信息共享,在提升自动驾驶系统的安全性、可靠性和机动性展现出 巨大潜力。然而,现有的多智能体通信方法受到当前通信媒介固有局限性的制约,包括高带宽需求、智能体异 构性以及信息丢失等。 为应对这些挑战,本文提出了 LangCoop,一种利用自然语言作为紧凑且富有表现力的智能体间通信媒介的协作 式自动驾驶新范式。LangCoop 包含两项关键创新: 通过在 CARLA 仿真环境中进行的广泛实验,本文证明,与基于图像的通信相比,LangCoop 在通信带宽上实现 了高达 96% 的显著降低(每条消息 < 2KB),同时在闭环评估中保持了具有 ...
演讲生成黑科技,PresentAgent从文本到演讲视频
机器之心· 2025-07-18 08:18
本项目为AI Geeks、澳洲人工智能研究所、利物浦大学、拉筹伯大学的联合工作。 我们提出了 PresentAgent,一个能够将长篇文档转化为带解说的演示视频、多模态智能体。现有方法大多局限于生成静态幻灯片或文本摘要,而我们的方 案突破了这些限制,能够生成高度同步的视觉内容和语音解说,逼真模拟人类风格的演示。 为了实现这一整合,PresentAgent 采用了模块化流程,如图 1 所示,包括以下步骤:1. 系统性地对输入文档进行分段;2. 规划并渲染幻灯片风格的视觉 帧;3. 利用大型语言模型与文本转语音模型生成具有上下文的语音解说;4. 最终将音频与视觉内容精确对齐,无缝组合成完整视频。 图 1 PresentAgent 概览。 该系统以文档(如网页)为输入,经过以下生成流程:(1)文档处理、(2)结构化幻灯片生成、(3)同步字幕创建,以及(4) 语音合成。最终输出为一个结合 幻灯片和同步讲解的演示视频。图中紫色高亮部分表示生成过程中的关键中间输出。 考虑到这种多模态输出的评估难度,我们引入了 PresentEval,一个由视觉-语言模型驱动的统一评估框架,从以下三个关键维度全面打分:内容忠实度 (Con ...
大历史中的超能力|荐书
腾讯研究院· 2025-07-18 08:18
Core Viewpoint - The article discusses the evolution of intelligence from early mammals to modern AI, emphasizing that intelligence can compensate for physical limitations and that historical events significantly influence the development of intelligence [3][4][11]. Group 1: Evolution of Intelligence - The first breakthrough in brain evolution occurred 550 million years ago, allowing organisms to differentiate between stimuli and develop basic emotional responses with only a few hundred neurons [4]. - The second breakthrough involved the advanced use of dopamine in vertebrates, enabling them to quantify the likelihood of rewards and develop curiosity through complex actions [5]. - The third breakthrough was the development of the neocortex in mammals, which allowed for imagination and planning, akin to slow thinking as described by Daniel Kahneman [5][6]. Group 2: AI and Intelligence - AI has significantly improved through reinforcement learning, which rewards processes rather than just outcomes, allowing for learning from each step rather than waiting for the end result [5]. - Current AI models, particularly large language models, demonstrate an understanding of language beyond mere memorization, indicating a significant advancement in AI capabilities [7][10]. - The potential future breakthroughs in AI may involve combining human and AI intelligence, enabling AI to simulate multiple worlds or understand complex rules in novel ways [11][12]. Group 3: Historical Context of Breakthroughs - Historical events, such as the asteroid impact that led to the extinction of dinosaurs, have provided opportunities for the evolution of mammals and the development of intelligence [3][15]. - The article suggests that significant changes in the world often arise from unexpected and radical shifts rather than gradual improvements [16][17].
Claude Code 作者:别再沉迷功能堆砌了!最好的 AI 工具,是把控制权还给你
AI科技大本营· 2025-07-18 07:40
编译自 ai.engineer 责编 | 王启隆 出品 | CSDN(ID:CSDNnews) 原文 | youtube.com/watch?v=Lue8K2jqfKk 投稿或寻求报道 | zhanghy@csdn.net 前几天编译的一篇 AI Engineer World's Fair 的演讲反响很好,这次我们再来一篇,演讲者来自 OpenAI 的死对头——Anthropic 公司。 现在 Claude Code 作为广大程序员的新工 具风头无两, 如果说 OpenAI 的 Sean Grove 是从哲学层面,高屋建瓴地探讨了我们工作的本质 ——从"代 码"到"规约"的价值转移;那么,Claude Code 的创造者 Boris Cherny ,则为我们带来了另一场截然不同的分享。 这他的核心观点可以概括为一种"极简主义哲学":在模型能力日新月异、最佳实践尚未定型的当下,最好的AI工具,或许不是一个功能繁复的"大教 堂",而应该是一个 简单、通用、无偏见(unopinionated) 的"乐高积木"。 它不试图替你决定工作流,而是给你最底层的、最原始的力量,让你自己去创造、去组合、去定义最适合你的工作方式 ...
谷歌发布Gemini嵌入模型,拓展基础层NLP能力
Investment Rating - The report does not explicitly provide an investment rating for the industry or specific companies involved. Core Insights - Google's release of the Gemini embedding model marks a significant advancement in NLP capabilities, achieving a score of 68.37 on the MTEB, surpassing OpenAI's 58.93, establishing it as the leading embedding model [1][12] - The ultra-low pricing strategy of $0.15 per million tokens is expected to democratize access to embedding capabilities, significantly lowering barriers for small and medium businesses, educators, and freelancers [2][14] - The Gemini model enhances Google's AI infrastructure, transitioning from content generation to a comprehensive semantic understanding platform, reinforcing its competitive edge in the AI workflow [3][15] Summary by Sections Event - On July 15, 2025, Google launched the Gemini embedding model, achieving a record score of 68.37 on the MTEB, and set a competitive price of $0.15 per million tokens [1][12] Commentary - The Gemini model excels across nine major task categories, showcasing its versatility and strong performance in various applications such as semantic retrieval and classification [2][13] - The aggressive pricing strategy is anticipated to disrupt the market, compelling competitors to reassess their pricing structures [5][18] Strategic Implications - The introduction of the Gemini embedding model signifies a strategic shift for Google, enhancing its capabilities in AI systems that require task matching and context retention [3][16] - The embedding layer is projected to become a new value center in AI workflows, indicating a transition from compute-centric to semantic-centric infrastructure [5][18]
为什么能落地?目标导航是怎么识别目标并导航的?
具身智能之心· 2025-07-18 03:21
目标驱动导航,赋予机器人自主完成导航目标 具身导航作为具身智能的核心领域,涉及语言理解、环境感知、路径规划三大技术支柱。目标驱动导航(Goal-Oriented Navigation)通过赋予机器人自主决策能 力,是具身导航中最具代表性的方向。 目标驱动导航要求智能体在陌生的三维环境中,仅凭目标描述(如坐标、图片、自然语言)等,即可自主完成环境探索与 路径规划。 与传统视觉语言导航(VLN)依赖显式指令不同,目标驱动导航系统需要实现从"听懂指令走对路"到"看懂世界自己找路"的跃迁:当人类下达"去厨房拿可乐"的指 令时,机器人需自主完成语义解析(识别厨房空间特征与可乐视觉属性)、环境建模(构建家居场景的空间拓扑)以及动态决策(避开移动的人类或宠物),这 背后凝聚着计算机视觉、强化学习与3D语义理解的交叉突破。 目标驱动导航技术已在多个垂直领域实现产业化落地。在终端配送场景中,该技术与社交导航算法结合,使机器人具备应对动态环境和人际交互的能力:美团无 人配送车通过动态路径重规划在复杂城市环境中执行递送任务,Starship Technologies的园区配送机器人已在欧美高校和社区部署。在医疗、酒店及餐饮场景,嘉 ...
ICCV2025 | One image is all you need,多模态指令数据合成,你只管给图,剩下的交给Oasis
机器之心· 2025-07-18 03:14
Core Viewpoint - The article discusses a novel multimodal instruction data synthesis method called Oasis, which eliminates the need for complex prompt design by relying solely on images for data generation, thereby enhancing efficiency and quality in data synthesis [1][6]. Research Motivation - The traditional multimodal data synthesis methods face issues such as lack of diversity, insufficient quality, and high reliance on manual input, which Oasis aims to address [7][8]. Method Introduction - Oasis operates through three main steps: constructing a hooking prompt for autoregressive sampling, classifying the sampling results to retain instruction-type outputs, and conducting quality control and response generation [11][12]. Data Characteristics Analysis - The Oasis dataset, Oasis-500k, was synthesized from approximately 500,000 images, demonstrating scalability as data volume increases linearly with the number of images [21][22]. - The average instruction length for Oasis data is 76.80, while the average response length is 71.16, indicating richer information content compared to LLaVA-NeXT [24]. - The language diversity in Oasis data includes English (78.52%), Chinese (18.66%), and several other languages, showcasing its broad applicability [27]. Experimental Results - Oasis shows significant performance improvements over baseline models, with average accuracy increases of 3.1% for Vicuna1.5, 1.8% for Qwen2.5, and 3.2% for Llama3 [38]. - The addition of 500k Oasis data resulted in an average score increase of 5.2%, confirming the effectiveness of data scaling [41]. Effectiveness of Oasis - Oasis demonstrates strong capabilities in synthesizing domain-specific data, particularly in OCR tasks, leading to notable performance enhancements in relevant benchmarks [43]. Quality Control Mechanism - The quality control mechanism for instructions is essential, as it significantly improves model performance, with a noted increase of over 7% in specific tasks [50].
明天,围观学习ACL2025论文分享会,最后报名了
机器之心· 2025-07-18 03:14
Core Insights - The AI field continues to be exciting in 2025, with numerous research releases from major tech companies and institutions [1] - The rapid pace of technological advancements in AI is overwhelming, with new models emerging almost weekly [3][4] - Developers and researchers are increasingly engaging in conferences and academic sharing to stay updated on cutting-edge research [5] Event Overview - The ACL 2025 conference, a significant event in the NLP field, will take place from July 27 to August 1 in Vienna, Austria, with a record number of over 8000 submissions [6][21] - The conference will feature various activities, including keynote speeches, paper presentations, roundtable discussions, and poster sessions [6][21] Keynote Speakers and Topics - The morning keynote will be presented by Che Wanxiang, focusing on trends and outlooks for ACL 2025 [10][20] - The afternoon keynote by Liu Pengfei will discuss reinforcement learning and complex reasoning in large models [22][24] Paper Presentations - A range of topics will be covered in paper presentations, including social exchange theory with large language models, metaphor-driven communication, and the dark side of LLMs [11][12][14] - The event will also include a roundtable discussion on the value of "context engineering" featuring experts from various institutions [26][31][35] Poster Sessions - Authors will present their papers and posters during the event, with live streaming available on multiple platforms for broader access [37]