EMMA

Search documents
Waymo's EMMA: Teaching Cars to Think - Jyh Jing Hwang, Waymo
AI Engineer· 2025-07-26 17:00
Autonomous Driving History and Challenges - Autonomous driving research started in the 1980s with simple neural networks and evolved to end-to-end driving models by 2020 [2] - Scaling autonomous driving presents challenges, requiring solutions for long-tail events and rare scenarios [5][7] - Foundation models, like Gemini, show promise in generalizing to rare driving events and providing appropriate responses [8][9][10][11] Emma: A Multimodal Large Language Model for Autonomous Driving - The company is exploring Emma, a driving system leveraging Gemini, which uses routing text and camera input to predict future waypoints [11][12][13][14] - Emma is self-supervised, camera-only, and high-dimension map-free, achieving state-of-the-art quality on the nuScenes benchmark [15][16][17] - Channel reasoning is incorporated into Emma, allowing the model to explain its driving decisions and improve performance on a 100k dataset [17] Evaluation and Validation - Evaluation is crucial for the success of autonomous driving models, including open loop evaluation, simulations, and real-world testing [25] - Generative models are being explored for sensor simulation to evaluate the planner under various conditions like rain and different times of day [26][27][28] Future Directions - The company aims to improve generalization and scale autonomous driving by leveraging foundation models [30] - Training on larger datasets improves the quality of the planner [19][20] - The company is exploring training on various tasks, such as 3D detection and rograph estimation, to create a more generalizable model [21][22][23][24]
NBIS vs. GOOGL: Which AI Infrastructure Stock is the Smarter Buy?
ZACKS· 2025-07-21 14:21
Key Takeaways NBIS posted 385% YoY revenue growth in Q1 and targets $750M-$1B ARR with major AI infra expansion. GOOGL is investing $75B in 2025 to build AI-focused infrastructure, servers, and data centers at scale. NBIS forecasts negative adjusted EBITDA for 2025, while GOOGL generated $36.15B in Q1 operating cash flow.Nebius Group N.V. (NBIS) is an upcoming player in the AI-infrastructure market, while Alphabet (GOOGL) is an established tech behemoth.The AI boom is fueling an unprecedented surge in dem ...
计算机行业点评报告:Kimi:Researcher、K2双线突破,强化学习革新与开源智能的双擎驱动
Huaxin Securities· 2025-07-21 13:34
2025 年 07 月 21 日 Kimi:Researcher、K2 双线突破,强化学习革 新与开源智能的双擎驱动 —计算机行业点评报告 推荐(维持) 事件 分析师:宝幼琛 S1050521110002 baoyc@cfsc.com.cn 联系人:谢孟津 S1050123110012 xiemj@cfsc.com.cn 市场表现 资料来源:Wind,华鑫证券研究 -20 0 20 40 60 80 (%) 计算机 沪深300 相关研究 1、《计算机行业周报:Grok4 屠榜 验证 Scalinglaw 有效,高德地图推 出小高智能体》2025-07-16 2、《计算机行业周报:谷歌发布全 新多模态大模型 Gemma3n,阿里达摩 院发布医疗 AI 模型 DAMOGRAPE》 2025-06-30 3、《计算机行业点评报告:优步 (UBER.O):战略技术攻坚筑壁 垒,生态破局启新程》2025-06-28 2025 年 6 月,Moonshot AI 推出 Kimi-Researcher,通过端 到端强化学习实现多轮搜索推理,在 Humanity's Last Exam 基准以 26.9% Pass@1 刷新 ...
机构认为AI应用侧将实现跨越式发展!科创人工智能ETF华夏(589010)横盘调整!
Mei Ri Jing Ji Xin Wen· 2025-07-21 06:33
科创人工智能ETF华夏(589010)紧密跟踪上证科创板人工智能指数,覆盖全产业链优质企业,兼具高 研发投入与政策红利支持,20%涨跌幅与中小盘弹性助力捕捉AI产业"奇点时刻"。 (文章来源:每日经济新闻) 华源证券认为,Kimi K2 正式发布且开源,国产大模型不断迭新,实现大模型底层能力的突破,将有望 持续推动国产大模型和AI 应用侧的跨越式发展,我们建议重视底层大模型突破带来的AI 应用侧的变 化,关注AI+游戏、教育、视频、电商、玩具、营销等方向的产业进度。 今日截至14点07,科创人工智能ETF华夏(589010)显现平盘,在零轴附近震荡调整。持仓股方面,成 分股涨跌互现,奥比中光领涨6.27%,优刻得、凌云光、恒玄科技等跟涨;福昕软件、澜起科技跌幅超 2%。流动性方面,盘中换手17.47%,成交总额突破1300万,量能大幅释放,调整力度显现。 消息方面,ELU中能坤域科技集团于7月18日正式发布"一脑多身多场景"战略,并正式推出原力光年子 品牌及其首款产品——AI Agent「LEMMA」。据悉,这是全球首个L4级别的能源AI智能体,基于公司 自研的ILM多模态大模型,结合Transformer时 ...
ELU推出全球首个L4级别的能源AI Agent,三大品牌协同演绎「一脑多身多场景」
IPO早知道· 2025-07-21 03:17
不可否认的是, 在 AI产业的下半场,单纯的技术突破已经不足以构建护城河。真正的竞争力来自于 对底层商业逻辑的深刻理解。 而 ELU的答案是:用一个大脑,适配多种载体,覆盖全场景,实现数 据价值的指数级增长。 "传统AI企业在做加法,我们在做乘法。"ELU创始人兼董事长白惠源表示,"当同一个AI大脑能够同 时驱动机器人、AI Agent、多种应用场景时,边际成本趋近于零,但价值却呈指数级增长。" 这种思维的转变,正在重新定义 AI产业的发展方向:从单点技术竞争,转向生态系统竞争;从产品 思维,转向平台思维;从功能价值,转向数据价值。 对AI Agent行业发展方向的重新定义。 本文为IPO早知道原创 作者| Stone Jin 微信公众号|ipozaozhidao 据 IPO早知道消息, ELU中能坤域科技集团 于 7月18日正是 发布 "一脑多身多场景"战略,并正式 推出原力光年子品牌及其首款产品——AI Agent「LEMMA」。 全球首个 L4级别的能源AI Agent 具备真正的 "主动意识" 在万亿级的能源市场面前,传统的软件工具已经显得力不从心。市场需要的不是简单的数据处理系 统,而是一个能够像人 ...
大模型自信心崩塌!谷歌DeepMind证实:反对意见让GPT-4o轻易放弃正确答案
量子位· 2025-07-20 05:08
现在 谷歌DeepMind携手伦敦大学 的一项新研究发现: 这种行为可能也不是谄媚,而是缺乏自信 …… 不仅如此,团队发现如GPT-4o、Gemma 3等大语言模型有"固执己见"和"被质疑就动摇"并存的冲突行为。 闻乐 发自 凹非寺 量子位 | 公众号 QbitAI LLM太谄媚! 就算你胡乱质疑它的答案,强如GPT-4o这类大模型也有可能立即改口。 大模型对于反向意见过度敏感 研究人员利用LLMs能在 不保留初始判断记忆 的情况下获取置信度的特性,选用了Gemma 3、GPT4o和o1-preview等具有代表性的大模 型,设计了一个两轮回答的实验。 简单来说就是,他们的研究弄明白了为啥大模型有时候自信但有时候也自我怀疑,关键就两点:一是总觉得自己一开始说的是对的,二是太把 别人反对的意见当回事儿。 当大模型表现出对自己的答案很自信时,这与人类认知具有一致性——人们通常会维护自己的观点。 不过,当模型面对反对声音过于敏感,产生动摇而选择其他答案时,又与人类这种倾向于支持自身观点的行为相悖。 来看看具体的实验过程。 第一回合是初始回答 :给 回答LLM 抛出二元选择问题,再让虚构的 建议LLM 给出反馈建议。 ...
OpenAI将启动5000万美元基金,支持非营利组织和社区组织;Kimi K2登顶全球开源模型冠军丨AIGC日报
创业邦· 2025-07-20 01:15
1.【Manus联合创始人发长文总结经验教训】7月19日,Manus联合创始人季逸超通过官方博客发布 了上千字的技术解析长文,复盘公司从今年年初爆火至今历经"起起落落"的开发思路与教训。前段时 间,AI智能体平台Manus被曝裁员、清空国内多平台账号等消息。(IT之家) 当地时间周五,其CEO阿拉文德・斯里尼瓦斯(Aravind Srinivas)向路透社透露,该公司正在与移 动设备制造商(OEM)洽谈,希望能够在智能手机上预装其新推出的Comet AI移动浏览器。斯里尼 瓦斯表示,"说服移动OEM将默认浏览器从Chrome改为 Comet 并不容易",他还强调了移动平台上 用户惯性带来的挑战。(IT之家) 更多AIGC资讯 …… 扫码订阅 AIGC 产业日报, 精选行业新闻,帮你省时间! 此外,如果您还想 查公司、找项目、看行业,深入了解人形机器人、商业航天、AGI等热门赛道 ,欢迎加入睿兽分析会员,解锁相关行业图谱和报告等。 3.【OpenAI将启动5000万美元基金,支持非营利组织和社区组织】OpenAI宣布将启动一项5000万 美元的初始基金,用于支持非营利组织和社区组织。声明称,通过这笔基金,Ope ...
What’s New in Google Accessibility | Episode 9 | American Sign Language
Google· 2025-07-16 14:03
Accessibility Innovations - Google is releasing SignGemma, an open model for sign language understanding, focusing on American Sign Language (ASL) and English, with plans to translate other sign languages into spoken language text [1][2] - Android expands Gemini integration into TalkBack screen reader, providing AI-generated descriptions for images and the entire screen, enabling conversational questions and responses [4] - Expressive Captions on Android now capture the intensity and nuance of speech, including emphasis and sounds like whispering or yawning [5][6] - Pixel's Magnifier app introduces live search, highlighting matches on the screen and vibrating when something is found, aiding blind and low vision users [6][7] - Project Astra Visual interpreter, in collaboration with Aira, is being tested to provide real-time descriptions of surroundings for blind and low-vision users, supervised by live Aira agents [8][9][10] Chrome and Chromebook Updates - Chrome now supports Optical Character Recognition (OCR) for scanned PDFs, allowing screen readers to interact with them [11][12] - Chromebooks now offer the ability to turn off the touchpad and flash the screen for new notifications [12] - New Chromebook features cater to users with limited dexterity and/or tremors, including Bounce Keys, Slow Keys, and Mouse Keys [13] Workspace Enhancements - Workspace allows users to embed interactive Google Calendars into websites, with screen-reader compatibility, improved spacing, and responsive layout [14]
What’s New in Google Accessibility | Episode 9
Google· 2025-07-16 14:02
Accessibility Innovations - Google is releasing SignGemma, an open model for sign language understanding, initially focusing on American Sign Language (ASL) and English, with the potential for community-driven adaptation to other sign languages [1][2] - Android's TalkBack screen reader now integrates Gemini to provide AI-generated descriptions of the entire screen, enabling conversational follow-up questions [4] - Expressive Captions on Android now capture the intensity and nuance of speech, including drawn-out sounds and subtle vocalizations like whispering and yawning [5][6] - The Pixel's Magnifier app introduces live search, allowing blind and low-vision users to type what they're looking for and receive real-time highlights and vibrations when matches are found [6][7] - Project Astra Visual Interpreter, in collaboration with Aira, is being tested to provide real-time descriptions of surroundings for blind and low-vision users, supervised by live Aira agents [8][9][10] Chrome and Chromebook Updates - Chrome now supports Optical Character Recognition (OCR) for scanned PDFs, enabling screen readers to interact with the text [11][12] - Chromebooks now offer the ability to turn off the touchpad, flash notifications for new alerts, and features like Bounce Keys, Slow Keys, and Mouse Keys to assist users with limited dexterity and/or tremors [12][13] Workspace Enhancements - Google Workspace allows users to embed interactive, screen-reader compatible Google Calendars into websites, featuring improved spacing, responsive layouts, and keyboard shortcut navigation [14]
腾讯研究院AI速递 20250711
腾讯研究院· 2025-07-10 14:48
Group 1 - Musk released Grok4, highlighting its superior performance in various tests, particularly in the "ultimate human exam" surpassing competitors [1] - Grok4's training approach has shifted to emphasize "first principles" thinking, learning to use tools to solve problems during the training phase [1] - Grok faces controversy over the "mechanical Hitler" issue, as its unfiltered approach attracts users but also raises concerns about AI alignment challenges [1] Group 2 - Microsoft open-sourced Phi-4-mini-flash-reasoning, utilizing the innovative SambaY architecture, achieving a 10x increase in reasoning efficiency and a 2-3x reduction in latency [2] - The SambaY architecture enables efficient memory sharing across layers without explicit positional encoding, significantly enhancing long context processing capabilities [2] - The new model is suitable for resource-constrained devices, running on a single GPU, excelling in advanced mathematical reasoning and long text generation, making it ideal for educational and research fields [2] Group 3 - Perplexity officially launched the AI browser Comet, centered around "agent search," competing with Google Chrome [3] - Comet's three main value propositions include personalized understanding of user thinking, powerful and user-friendly content comprehension, and efficiency improvements reducing tab switching [3] - Comet features rich functionalities, capable of replacing user actions on the web, intelligently processing content, managing email calendars, and searching personal data, currently supporting Mac and Windows systems [3] Group 4 - OpenAI completed the acquisition of io company, with former Apple designer Jony Ive and his team LoveFrom joining to take on deep design and creative responsibilities [4][5] - Ive is expected to assist OpenAI in developing new intelligent hardware products, with initial ideas being transformed into feasible designs [5] - The io company, co-founded by Ive and several experts, includes hardware and software engineers and scientists, and will closely collaborate with OpenAI's R&D team [5] Group 5 - Google released new medical AI models: the multimodal MedGemma 27B and the lightweight encoder MedSigLIP, expanding the HAI-DEF medical model collection [6] - The MedGemma series includes 4B and 27B versions, supporting image and text input with text output; the 4B version achieved a 64.4% accuracy rate in medical Q&A tests, while the 27B version reached 87.7% [6] - MedSigLIP, with only 400 million parameters, is a medical image encoder optimized through various medical imaging techniques, suitable for image classification, zero-shot classification, and semantic retrieval, providing visual understanding for MedGemma [6] Group 6 - Tencent launched a co-creation activity for the 2026 "Year of the Horse" zodiac penguin, with requests surging 300% within hours and token usage doubling, prompting urgent server expansion [7] - The activity invites users to design the 2026 "Horse Goose" figurine using the Mix Yuan 3D AI creation engine, allowing text input, image uploads, or sketch submissions to generate designs [7] - Outstanding works will have the opportunity to be co-branded with Tencent for mass production and sold in official merchandise stores, with the activity closing on July 27, 2025 [7] Group 7 - OpenAI plans to release an "open weight model," similar to the o3 mini level, as early as next week, allowing companies to deploy it themselves, marking the first model weight release since 2019 [8] - OpenAI is developing an AI browser based on Chromium, which will process web content within the ChatGPT native interface, enabling AI agents to execute tasks directly, challenging Google Chrome [8] - OpenAI is expanding its business scope from model development to browsers and other user interfaces, indicating its ambition for technological leadership and ecosystem control [8] Group 8 - Hugging Face and Pollen Robotics jointly launched the open-source robot Reachy Mini, starting at $299, designed for human-robot interaction and AI experimentation [10] - Reachy Mini offers a basic version ($299) and a wireless version ($449), supporting Python programming and equipped with multimodal interaction features like cameras, microphones, and speakers [10] - The robot stands 28 cm tall, weighs 1.5 kg, provides 15 preset behaviors, is fully open-source and extensible, with the basic version expected to ship by late summer 2025 and the wireless version in batches starting fall 2025 [10] Group 9 - Meta released a 40-page report, positioning the "mental world model" alongside the physical world model as a key component of embodied intelligence [11] - The mental world model focuses on human goals, intentions, emotional states, social relationships, and communication methods, enabling AI to understand human psychological states and engage in social interactions [11] - Meta proposed a dual-system architecture integrating "observational learning" (System A) and "action learning" (System B), where the former provides abstract knowledge and the latter explores actions for more efficient agent learning [11] Group 10 - Top AI products like Cursor, Perplexity, and Lovable have adopted a "anti-framework" approach, building directly on basic AI units rather than using frameworks [12] - Frameworks have become innovation barriers in the rapidly changing AI field, leading to excessive abstraction, bloated structures, and slow iterations, while basic units offer combinability and specialization [12] - The basic unit method (e.g., Memory, Thread, Tools) allows developers to construct AI products like building blocks, reducing cognitive load and enhancing performance and flexibility, better suited for rapid AI technology iterations [12]