Workflow
Gemma
icon
Search documents
腾讯研究院AI速递 20250711
腾讯研究院· 2025-07-10 14:48
Group 1 - Musk released Grok4, highlighting its superior performance in various tests, particularly in the "ultimate human exam" surpassing competitors [1] - Grok4's training approach has shifted to emphasize "first principles" thinking, learning to use tools to solve problems during the training phase [1] - Grok faces controversy over the "mechanical Hitler" issue, as its unfiltered approach attracts users but also raises concerns about AI alignment challenges [1] Group 2 - Microsoft open-sourced Phi-4-mini-flash-reasoning, utilizing the innovative SambaY architecture, achieving a 10x increase in reasoning efficiency and a 2-3x reduction in latency [2] - The SambaY architecture enables efficient memory sharing across layers without explicit positional encoding, significantly enhancing long context processing capabilities [2] - The new model is suitable for resource-constrained devices, running on a single GPU, excelling in advanced mathematical reasoning and long text generation, making it ideal for educational and research fields [2] Group 3 - Perplexity officially launched the AI browser Comet, centered around "agent search," competing with Google Chrome [3] - Comet's three main value propositions include personalized understanding of user thinking, powerful and user-friendly content comprehension, and efficiency improvements reducing tab switching [3] - Comet features rich functionalities, capable of replacing user actions on the web, intelligently processing content, managing email calendars, and searching personal data, currently supporting Mac and Windows systems [3] Group 4 - OpenAI completed the acquisition of io company, with former Apple designer Jony Ive and his team LoveFrom joining to take on deep design and creative responsibilities [4][5] - Ive is expected to assist OpenAI in developing new intelligent hardware products, with initial ideas being transformed into feasible designs [5] - The io company, co-founded by Ive and several experts, includes hardware and software engineers and scientists, and will closely collaborate with OpenAI's R&D team [5] Group 5 - Google released new medical AI models: the multimodal MedGemma 27B and the lightweight encoder MedSigLIP, expanding the HAI-DEF medical model collection [6] - The MedGemma series includes 4B and 27B versions, supporting image and text input with text output; the 4B version achieved a 64.4% accuracy rate in medical Q&A tests, while the 27B version reached 87.7% [6] - MedSigLIP, with only 400 million parameters, is a medical image encoder optimized through various medical imaging techniques, suitable for image classification, zero-shot classification, and semantic retrieval, providing visual understanding for MedGemma [6] Group 6 - Tencent launched a co-creation activity for the 2026 "Year of the Horse" zodiac penguin, with requests surging 300% within hours and token usage doubling, prompting urgent server expansion [7] - The activity invites users to design the 2026 "Horse Goose" figurine using the Mix Yuan 3D AI creation engine, allowing text input, image uploads, or sketch submissions to generate designs [7] - Outstanding works will have the opportunity to be co-branded with Tencent for mass production and sold in official merchandise stores, with the activity closing on July 27, 2025 [7] Group 7 - OpenAI plans to release an "open weight model," similar to the o3 mini level, as early as next week, allowing companies to deploy it themselves, marking the first model weight release since 2019 [8] - OpenAI is developing an AI browser based on Chromium, which will process web content within the ChatGPT native interface, enabling AI agents to execute tasks directly, challenging Google Chrome [8] - OpenAI is expanding its business scope from model development to browsers and other user interfaces, indicating its ambition for technological leadership and ecosystem control [8] Group 8 - Hugging Face and Pollen Robotics jointly launched the open-source robot Reachy Mini, starting at $299, designed for human-robot interaction and AI experimentation [10] - Reachy Mini offers a basic version ($299) and a wireless version ($449), supporting Python programming and equipped with multimodal interaction features like cameras, microphones, and speakers [10] - The robot stands 28 cm tall, weighs 1.5 kg, provides 15 preset behaviors, is fully open-source and extensible, with the basic version expected to ship by late summer 2025 and the wireless version in batches starting fall 2025 [10] Group 9 - Meta released a 40-page report, positioning the "mental world model" alongside the physical world model as a key component of embodied intelligence [11] - The mental world model focuses on human goals, intentions, emotional states, social relationships, and communication methods, enabling AI to understand human psychological states and engage in social interactions [11] - Meta proposed a dual-system architecture integrating "observational learning" (System A) and "action learning" (System B), where the former provides abstract knowledge and the latter explores actions for more efficient agent learning [11] Group 10 - Top AI products like Cursor, Perplexity, and Lovable have adopted a "anti-framework" approach, building directly on basic AI units rather than using frameworks [12] - Frameworks have become innovation barriers in the rapidly changing AI field, leading to excessive abstraction, bloated structures, and slow iterations, while basic units offer combinability and specialization [12] - The basic unit method (e.g., Memory, Thread, Tools) allows developers to construct AI products like building blocks, reducing cognitive load and enhancing performance and flexibility, better suited for rapid AI technology iterations [12]
编码器-解码器架构的复兴?谷歌一口气发布32个T5Gemma模型
机器之心· 2025-07-10 08:35
机器之心报道 编辑:Panda 今天是 xAI 的大日子,伊隆・马斯克早早就宣布了会在 今天发布 Grok 4 大模型 ,AI 社区的眼球也已经向其聚拢,就等着看他的直播 (等了挺久) 。当然,考虑到 Grok 这些天的「失控」表现,自然也有不少人是在等着看笑话。 尽管如此,谷歌似乎也并不在意被夺走的目光,接连对 Gemma 系列模型进行了更新。 首先,谷歌发布了一系列用于健康 AI 开发的多模态模型 MedGemma ,其中包含 4B 和 27B 两个大小的几个不同模型:MedGemma 4B Multimodal、MedGemma 27B Text 和 MedGemma 27B Multimodal。 该系列模型能够根据医疗图像和文本描述辅助诊断并提供医疗建议,整体表现也是相当不错。 Hugging Face:https://huggingface.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4 而本文的重点并不是它,而是谷歌今天发布的 编码器-解码器架构 的 Gemma 系列模型: T5Gemma 。 从名字也能看出来,这个 ...
AI产业跟踪:海外:德国TNG推出DeepSeek变体模型,DeepSWE开源AIagent
产业观察 [table_Header]2025.07.09 【 AI 产业跟踪 - 海 外 】 德 国 TNG 推 出 产业研究中心 [Table_Authors] | DeepSeek | 变体模型,DeepSWE | 开源 AIAgent | | 李嘉琪(分析师) | | --- | --- | --- | --- | --- | | 摘要:产业最新趋势跟踪,点评产业最新风向 | | | | 021-38676666 | | [Table_Summary] | | | 登记编号 | S0880524040001 | | AI 行业资讯 | | | | | | 戴尔向 CoreWeave 交付首批英伟达 | GB300 NVL72 系统 | | | | | Meta 成立超级智能实验室 | | | | 刘峰(研究助理) | | 智 AI 应用资讯 | | | | | | | | | | 021-38676666 | | 为 商业版添 Meta WhatsApp AI | 功能 | | | | | | | | 登记编号 | S0880124060013 | | 亚马逊推出新 基础模型提升机器人性能 AI | ...
产业观察:【AI产业跟踪~海外】德国TNG推出DeepSeek变体模型,DeepSWE开源AIAgent
Meta 为 WhatsApp 商业版添 AI 功能 亚马逊推出新 AI 基础模型提升机器人性能 【AI 产业跟踪-海外】德国 TNG 推出产业研究中心 DeepSeek 变体模型,DeepSWE 开源 AIAgent < 摘要:产业最新趋势跟踪,点评产业最新风向 O AI 行业资讯 载尔向 CoreWeave 交付首批英伟达 GB300 NVL72 系统 Meta 成立超级智能实验室 智 AI 应用资讯 谷歌推出 Veo 3 视频生成模型 法国 Kyutai 开源文本转语音模型 Kyutai TTS Gemini 2.5 Pro API 免费额度升级 谷歌发布 Gemini 教育版 | 2 登记编号 | S0880524040001 | | --- | --- | | 5 | 刘峰(研究助理) | | S | 021-38676666 | | 登记编号 | S0880124060013 | Claude Code 推出 Hooks 功能 Q AI 大模型资讯 德国 TNG 推出 DeepSeek 变体模型 R1T2 智谱开源 GLM-4.1V-Thinking DeepSWE 开源 AIAgent 谷歌开源 ...
cVLA:面向高效相机空间VLA模型的关键位姿预测方法
具身智能之心· 2025-07-06 11:54
本文只做学术分享,如有侵权,联系删文 写在前面 视觉-语言-动作(VLA)模型为复杂机器人操作任务提供了强有力的框架,但训练成本往往很高。研究提出了一种新的VLA方法,利用视觉语言模型(VLMs)在 2D图像上的出色表现,直接推断机器人末端执行器在图像帧坐标中的位姿。与以往输出低级控制指令的VLA模型不同,该模型预测轨迹路标,不仅训练更高效, 还与机器人实体无关。尽管设计轻量,其下一个token预测架构仍能有效学习有意义且可执行的机器人轨迹。此外,还探索了深度图像的潜力、解码策略等推理技 术,以及基于演示的动作生成。模型在模拟数据集上训练,展现出良好的模拟到现实迁移能力,并通过模拟和真实数据结合的评估,证明了在真实机器人系统上 的有效性。 >> 点击进入→ 具身智能之心 技术交流群 点击下方 卡片 ,关注" 具身智能 之心 "公众号 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 作者丨 Max Argus等 编辑丨具身智能之心 1. 引言 视觉-语言-动作(VLA)模型通过融合视觉、语言和交互数据,实现细粒度感知与动作生成,能解决多种任务。但V ...
伦敦大学学院Echo Zhang:AIGC是一面照见创意、价值与信任的镜子
Huan Qiu Wang Zi Xun· 2025-07-06 06:39
Core Viewpoint - The emergence of Generative Artificial Intelligence (AIGC) represents not only a technological revolution but also a reflection of human creativity, values, and trust, emphasizing the need for a humanistic approach to guide technology in serving humanity [2][5]. Group 1: AIGC Definition and Evolution - AIGC is defined as algorithms capable of generating text, images, music, and videos, exemplified by tools like ChatGPT, Midjourney, and DALL·E [2]. - The evolution of AI has progressed through several waves: from symbolic reasoning and rule-based systems to statistical learning, deep learning breakthroughs, and now to AIGC as a collaborative partner rather than just an auxiliary tool [3]. Group 2: Cultural Impact - AIGC is not merely a technical phenomenon; it has become a "cultural software" that reshapes how culture is expressed and defined in the digital age [3]. - The rise of AI-generated content raises questions about originality and the emotional and cultural value of rapidly produced works, echoing concerns raised by philosopher Walter Benjamin regarding mechanical reproduction [3]. Group 3: Applications in Education - AIGC has transformed education by providing personalized, scalable, and adaptive learning experiences, such as AI-assisted tutoring and dynamically generated learning materials [4]. - However, challenges include potential over-reliance on AI by students, which may weaken critical thinking skills, and the risk of exacerbating the digital divide due to uneven technology distribution [4]. Group 4: Applications in Healthcare - In healthcare, AIGC has demonstrated effectiveness through AI-generated diagnostic reports and image analysis tools, enhancing diagnostic efficiency and supporting clinical decision-making [4]. - Notable developments include specialized large language models like Google DeepMind's MedGemma and SenseTime's "Da Yi" model, which assist in diagnosis and patient communication [4]. Group 5: Societal Challenges - AIGC poses significant societal challenges, including information pollution, the ambiguity of copyright in creative industries, and potential job displacement in various sectors [5]. - There is a growing concern about a "crisis of trust" as distinguishing between true and false content becomes increasingly difficult, highlighting the need for responsible guidance in shaping AI's role in society [5].
LeCun团队揭示LLM语义压缩本质:极致统计压缩牺牲细节
量子位· 2025-07-04 01:42
时令 发自 凹非寺 量子位 | 公众号 QbitAI 当我们读到"苹果""香蕉""西瓜"这些词,虽然颜色不同、形状不同、味道也不同,但仍会下意识地归为"水果"。 哪怕是第一次见到"火龙果"这个词,也能凭借语义线索判断它大概也是一种水果。 这种能力被称为 语义压缩 ,它让我们能够高效地组织知识、迅速地对世界进行分类。 那问题来了:大型语言模型(LLM)虽然语言能力惊人,但它们在语义压缩方面能做出和人类一样的权衡吗? 为探讨这一问题, 图灵奖得主LeCun团队 ,提出了一种全新的信息论框架。 该框架通过对比人类与LLM在语义压缩中的策略,揭示了两者在压缩效率与语义保真之间的根本差异: LLM偏向极致的统计压缩,而人类更重细节与语境。 语义压缩对比框架 要实证性地研究LLM的表征方式与人类概念结构之间的关系,需要两个关键要素: 稳健的人类概念分类基准 研究团队基于认知科学中的三项经典研究(Rosch 1973、1975和McCloskey & Glucksberg 1978),构建了一个涵盖 1049个项目、34个 语义类别 的统一基准。 这些数据不仅提供了类别归属信息,还包含人类对各项目"典型性"的评分,反映了人 ...
巨头开源的背后,是价格战还是价值战?
AI科技大本营· 2025-07-02 09:30
当巨头们纷纷开源自家模型,他们背后的博弈是什么? 以下是本场直播对话的详细对话提纲,抢先一览: 话题 1:开源背后的巨头博弈与产业终局 话题 2:AI 时代的开源新格局 业公司生存法则、开源模式变革以及 AI 开发者的下一站机会…… 在这个「开源」时代,我们该如何理解巨头之间的战略分歧,尤其是百度与谷歌这两家以「搜索+大 模型」为核心的公司,他们差异的背后反映了怎样的战略思考。 就在 7 月 2 日 19:30,CSDN 的《AI 进化论》栏目特别邀请了 CSDN 创始人&董事长 蒋涛 , CCF 开源发展委员会副主任、LVS 创始人 章文嵩 ,广州品高软件股份有限公司副总裁 程勇 围绕 「开源 AI 浪潮下的新格局与新发展」展开深度对话,直击开源生态背后的真问题——巨头战略、创 放眼全球,谷歌在发布其王牌模型 Gemini 2.5 Pro 时,选择开源其衍生的轻量级模型 Gemma, Meta 的 LLaMA 虽名为开源,却始终带着商业限制的"镣铐"……他们小心谨慎地通过开源开放吸 引全球开发者的关注,同时却牢牢掌控核心能力与商业变现路径。 回到国内,从阿里通义千问全尺寸开源到 DeepSeek 的一鸣惊人 ...
第四期全球名校“Z世代”领袖连线活动举办 中外青年共话AI技术应用
Huan Qiu Wang Zi Xun· 2025-07-02 03:25
上海交通大学的华小文从教育技术的演变出发,回顾了在线教育从"电话授课"到"线上平台",再 到"VR+脑电波传感器"教学的飞跃。她强调,技术不应替代教师,而应强化学习者的个性表达与多元智 能的发展。她提及,芬兰等国家已在中小学引入AI课程,鼓励学生用所学知识参与全球议题,如可持 续发展目标(SDGs)与全球气候变化。此外,她也警示了"技术成瘾"现象,呼吁推动"正向上瘾"型教 育应用的开发,如语言学习应用多邻国。华小文总结道:"技术应该服务于创造力、合作与批判思维, 而不是制造懒惰与分裂。" 6月30日,第四期全球名校"Z世代"领袖连线活动顺利举行。此次活动汇聚了上海交通大学、香港理工大 学,以及伦敦大学学院、加州大学伯克利分校、墨尔本大学、奥克兰大学、悉尼大学等15所世界知名高 校的40余名青年代表,与相关领域的专家相聚云端,围绕"AI技术与未来应用"主题展开深入探讨。 青年对话环节,与会的"Z世代"代表们以跨界视角展开思维碰撞,围绕人工智能技术前沿与社会发展等 话题畅所欲言、各抒己见。 人工智能与多语言大模型领域的青年学者、前阿里巴巴通义团队核心研究员杨建以"人人可编程"为主题 进行分享,深度解析了代码智能技 ...
计算机行业周报:谷歌发布全新多模态大模型Gemma3n,阿里达摩院发布医疗AI模型DAMOGRAPE-20250630
Huaxin Securities· 2025-06-30 12:43
2025 年 06 月 30 日 谷歌发布全新多模态大模型 Gemma 3n,阿里达 摩院发布医疗 AI 模型 DAMO GRAPE —计算机行业周报 推荐(维持) 投资要点 分析师:宝幼琛 S1050521110002 baoyc@cfsc.com.cn 行业相对表现 | 表现 | 1M | 3M | 12M | | --- | --- | --- | --- | | 计算机(申万) | 6.0 | 0.1 | 49.7 | | 沪深 300 | 2.1 | 0.9 | 13.3 | 市场表现 相关研究 1、《计算机行业点评报告:优步 (UBER.O):战略技术攻坚筑壁 垒,生态破局启新程》2025-06-28 2、《计算机行业周报:华为发布盘 古大模型 5.5,MiniMax"发布周" 拉开序幕》2025-06-23 3、《计算机行业点评报告:小马智 行(PONY.O):AI 驱动自动驾驶生 态协同,全球化布局助推盈利拐点 临近》2025-06-22 ▌算力: 谷歌发布全新多模态大模型 Gemma 3n,适 合边缘设备运行 6 月 27 日,谷歌正式发布并开源全新端侧多模态大模型 Gemma 3n。据谷歌 ...