Workflow
量子位
icon
Search documents
谷歌太壕了!编程Agent大招至简:开源且免费,百万上下文、多模态、MCP全支持
量子位· 2025-06-26 02:11
官方还着重提醒: Gemini CLI写码很强,但不只能拿来编程哦。 调用一下Veo和Imagen,在命令行里搞视频生成,也就是一句话的事: 消息一出,讨论度立刻爆炸,GitHub仓库标星也一夜狂飙至10.8k。 鱼羊 发自 凹非寺 量子位 | 公众号 QbitAI 开源且免费 !谷歌对编程Agent出手了。 推出 Gemini CLI ,让你在终端里就能访问Gemini,并且提供" 业界最高免费限额 ": 100万上下文窗口的Gemini 2.5 Pro,每分钟允许60次模型请求,每天的上限则是1000次。 有网友干脆给Cursor打出了RIP。 而对于更直接能拿来对标的Claude Code和OpenAI Codex CLI,更是一句免费盖过所有(doge)。 简而言之,这回家大业大的谷歌,实属"壕"气干云,下血本了。 命令行里用Gemini 来看一下Gemini CLI的详细情况。 官方介绍,其能力涵盖代码理解、文件操作、命令执行和动态故障排除等诸多方面。 简单来说,有了Gemini CLI,大家就可以在命令行里用自然语言指挥Gemini模型写代码、Debug了。 更具体的应用实例包括,编码方面: 工具 ...
MIT终身教授何恺明,入职谷歌了
量子位· 2025-06-26 02:11
Core Viewpoint - Kaiming He, a prominent figure in computer vision, has recently joined Google DeepMind as a part-time distinguished scientist after obtaining tenure at MIT, indicating a strategic collaboration between academia and industry in AI research [1][2][5][7]. Group 1: Kaiming He's Career and Achievements - Kaiming He is recognized as a legendary figure in the computer vision field, having received his undergraduate degree from Tsinghua University and his PhD from the Chinese University of Hong Kong under the supervision of Xiaodong Wu [9][10]. - He co-authored the award-winning paper "Single Image Haze Removal Using Dark Channel Prior" in 2009, marking a significant achievement for Asian researchers in the CVPR conference [10]. - After completing his PhD in 2011, he worked at Microsoft Research Asia and later joined Facebook AI Research (FAIR), where he developed the influential ResNet architecture, which has been cited over 280,000 times [11][12][15]. - His research contributions include notable works like Faster R-CNN and Mask R-CNN, the latter winning the best paper award at ICCV 2017 [15][18]. Group 2: Recent Developments and Collaborations - Kaiming He joined MIT's EECS department in 2023, marking a return to academia after a significant tenure in the industry, which garnered attention and speculation about Meta's loss [16][18]. - His recent research focuses on model performance optimization, including advancements in image generation techniques and the development of highly compressed tokenizers for text generation [20]. - He has collaborated with Google DeepMind on various projects, including the paper "Fractal Generative Models," which introduced a new paradigm for generating high-resolution images [22][23]. - The collaboration with DeepMind has been ongoing, with previous joint efforts addressing challenges in visual autoregressive models and proposing solutions for scaling these models [25][27].
老黄新鲜一刀,RTX 5050正式官宣
量子位· 2025-06-25 08:12
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 目前,GeForce RTX 5050笔记本电脑现已开始由全球OEM发售,建议零售价人民币7499元起。此外,系统制造商和集成商也将推出搭载该 显卡的台式机。 这款跳过RTX 4050 (桌面版没有RTX 4050) 直接迭代的新品,搭配 DLSS 4 多帧生成技术,宣称能让《赛博朋克2077》光追帧率突破 150fps 。 英伟达全新RTX 5050桌面 / 笔记本GPU正式官宣! 定档 7月上市 ,国内建议零售价也已出炉: 2099元起 (比RTX 5060便宜400元) 。 相比爆款RTX 3050,GeForce RTX 5050 GPU的光栅化性能 (分辨率1080P) 平均 提升60% ,在支持全套DLSS 4技术的游戏中,带来 高达4倍 的性能提升。 不过,英伟达这次推出的新品的显存配置可以说是"新旧混搭",一边是2099元的入门级 桌面卡 搭载上一代 GDDR6 显存,另一边是7499元 的 笔记本卡 越级用上 GDDR7 。 老黄这波"精准刀法" 玩的又是什么套路? 参数细节 把显存规格砍成三六九等,让玩家在"够用"和"爽玩"之间反复纠结 ...
华科校友在港冲刺AI infra第一股!已是中国最能赚的独立边缘云服务商,王小川天使轮就投了
量子位· 2025-06-25 08:12
Core Viewpoint - PPIO is positioned as the first AI infrastructure stock in Hong Kong, aiming to capitalize on the growing demand for AI cloud computing services and edge cloud computing solutions [1][4]. Company Overview - PPIO, an independent distributed cloud computing service provider, has recently submitted its prospectus to the Hong Kong Stock Exchange [2]. - The company was founded by two alumni from Huazhong University of Science and Technology, who previously worked together on the PPTV platform [3][31]. Business Segments - PPIO's current business is divided into two main segments: edge cloud computing services and AI cloud computing services [5][11]. - Edge cloud computing services involve deploying computing resources closer to data generation points, enhancing data processing speed and reducing latency [6][7]. - AI cloud computing services include GPU cloud services and model APIs, allowing users to access high-performance computing resources on demand [11][12]. Service Details - Edge cloud computing services are further categorized into edge node services and edge CDN, with products like edge containers and edge bare metal servers [8][9]. - AI cloud computing services offer immediate access to mainstream open-source large models and allow clients to deploy their custom models on PPIO's infrastructure [12][14]. Market Position and Growth - As of the end of 2024, PPIO has established the largest computing power network in China, with over 4,000 computing nodes [15]. - The number of registered developers for AI cloud computing services surged from 12,112 in 2023 to 295,524 by May 2024, reflecting a growth rate of 936.5% [18]. Financial Performance - PPIO has become the highest-revenue independent edge cloud computing service provider in China, with a market share of 4.1% in 2024 [20]. - Revenue figures from 2022 to 2024 show a compound annual growth rate of 39.7%, with revenues of 286 million, 358 million, and 558 million RMB respectively [20][21]. - The company is currently operating at a loss, with losses increasing from 85 million RMB in 2022 to 294 million RMB in 2024 [24]. Research and Development - PPIO's R&D expenses have been significant, accounting for 14.5% to 18.9% of total revenue over the past three years, with a total of 86 million RMB spent in 2024 [26][28]. - The company employs a large R&D team, with 67.6% of its 204 employees dedicated to research and development [30]. Cash Position and Future Plans - As of the end of 2024, PPIO holds 1.13 billion RMB in cash and cash equivalents, providing a solid financial cushion for future operations [30]. - The funds raised through the IPO will be allocated to enhancing technical capabilities, upgrading multi-modal API platforms, expanding market share, and pursuing international expansion [30]. Industry Outlook - The global edge cloud computing services market is projected to grow from 185.1 billion RMB in 2024 to nearly 500.3 billion RMB by 2029, while the AI cloud computing services market is expected to reach 31.5 billion RMB in 2024, with a compound annual growth rate of 68.5% from 2024 to 2029 [51].
奥特曼回应OpenAI硬件抄袭:投资没谈拢就来反咬我!新一轮邮件证据曝光
量子位· 2025-06-25 08:12
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 年度最抓马的科技圈连续剧更新了,疑似大反转? (doge) 奥特曼就OpenAI硬件 "抄袭门" 进行了回应。 IYO一直寻求投资或收购,这很酷,但当得不到想要的东西,就求助于诉讼则一点都不酷。 他称这场指控OpenAI商标侵权诉讼是 "愚蠢的" 、 "令人失望的" 和 "彻头彻尾的错误" 。 甚至他表示,在诉讼发生前几天, IYO创始人 Jason Rugolo (也是本次扯头花大战的另一位主人公) 还在请求收购…… 这是被收购不成,恼羞成怒了? 网友们纷纷表示:这反转,精彩,实在是精彩! 不过Jason Rugolo也立马现身评论区反驳,表示反感奥特曼网络升堂的行为,他只想在产品上公平竞争,并要回自己产品的名字。 这下网友们的八卦欲被彻底点燃,就连一旁吃瓜的 马斯克 也被拉进战场。 网友表示:bro,你忘了你和老马的诉讼了吗?因果报应啊。 不过硬件争议纷纷扰扰,ChatGPT的最新功能却还在赶来的路上…… 新增协作和聊天功能,网友表示:这才是奥特曼主场。 先给不熟悉这场"抄袭门"事件的观众们整个前情提要: 本月早些时候,OpenAI斥巨资和前苹果首席设 ...
人类创造力的核心机制,AI已经开始掌握了 | 北大CogSci 2025(Oral)
量子位· 2025-06-25 05:00
Core Viewpoint - The article discusses a recent study by a team from Peking University that reveals AI's ability to understand and replicate human-like creativity, specifically through a framework for evaluating combinational creativity in AI models [1][3][11]. Group 1: AI's Creative Capabilities - Advanced models like GPT-4 have surpassed average human performance in creative understanding tasks, achieving an accuracy of 70% compared to 50% for average humans [2][21]. - The study introduces a systematic framework for quantifying AI's combinational creativity, which enhances AI's creative generation capabilities [3][12]. - AI is learning to engage in "combinational creativity," understanding the deeper meanings behind seemingly unrelated elements, akin to human artists [7][9]. Group 2: The IEI Framework - The research team developed the IEI framework (Identification–Explanation–Implication) to assess AI's combinational creativity, breaking it down into three levels: identification of basic elements, explanation of functional relationships, and implication of deeper meanings [13][17]. - This framework not only evaluates AI but also provides new insights for the computational study of human creativity [14][28]. - The IEI framework has shown to improve the quality of creative outputs by 35% when integrated into generative models like DALL-E 3, indicating that AI creativity can be optimized through structured thinking [23][24]. Group 3: Comparative Analysis - While GPT-4 and other advanced models have outperformed average humans, they still lag behind human experts in deep semantic interpretation, with experts achieving an average accuracy of 78% [21][22]. - The study highlights the importance of understanding the creative process rather than just the novelty and utility of the results, which has been a focus of traditional assessments [12][11]. Group 4: Practical Applications - The research provides a methodology for AI's creative applications, demonstrating how AI can express abstract properties through the recombination of animal features [28][33]. - AI's potential in product design is illustrated by its ability to creatively combine everyday items with symbolic objects, showcasing its application in various industries [41].
人形机器人首次打通视觉感知与运动断层,UC伯克利华人博士让宇树G1现场演示
量子位· 2025-06-25 05:00
Core Viewpoint - The article discusses the LeVERB framework developed by teams from UC Berkeley and Carnegie Mellon University, which enables humanoid robots to understand language commands and perform complex actions in new environments without prior training [1][3]. Group 1: LeVERB Framework Overview - LeVERB framework bridges the gap between visual semantic understanding and physical movement, allowing robots to perceive their environment and execute commands like humans [3][12]. - The framework consists of a hierarchical dual system that uses "latent action vocabulary" as an interface to connect high-level understanding and low-level action execution [17][20]. - The high-level component, LeVERB-VL, processes visual and language inputs to generate abstract commands, while the low-level component, LeVERB-A, translates these commands into executable actions [23][24]. Group 2: Performance and Testing - The framework was tested on the Unitree G1 robot, achieving an 80% zero-shot success rate in simple visual navigation tasks and an overall task success rate of 58.5%, outperforming traditional methods by 7.8 times [10][36]. - LeVERB-Bench, a benchmark for humanoid robot whole-body control (WBC), includes over 150 tasks and aims to provide realistic training data for visual-language-action models [7][26]. - The benchmark features diverse tasks such as navigation, reaching, and sitting, with a total of 154 visual-language tasks and 460 language-only tasks, generating extensive realistic motion trajectory data [30][31]. Group 3: Technical Innovations - The framework employs advanced techniques like ray tracing for realistic scene simulation and motion capture data to enhance the quality of training datasets [27][30]. - The training process involves optimizing the model through trajectory reconstruction and adversarial classification, ensuring efficient processing of visual-language information [23][24]. - Ablation studies indicate that components like the discriminator and kinematic encoder are crucial for maintaining model performance and enhancing generalization capabilities [38].
这个AI能救命!提前6个月发现胃癌病灶,突破医学影像认知,达摩院做成了
量子位· 2025-06-25 05:00
衡宇 发自 凹非寺 量子位 | 公众号 QbitAI AI新进展,救命的那种。 现在, 只需做一次体检时常规的CT检查,再用AI分析,就有可能在癌症还没有露出明显症状之前——比如提前半年——把它揪出来 。 今天,国内与之相关的一项成果登上国际顶级期刊《自然·医学》 (Nature Medicine) : 咱们普通人,往往都是闻癌色变。 众多癌症中,胃癌不仅是我国最常见的恶性肿瘤之一,且导致的死亡人数还特别庞大——每年约26万例,在所有恶性肿瘤中居于第三。 不过患上胃癌并不一定等于死神降临。 如果在早期发现并切除,5年生存率可达95%~99%,甚至有完全治愈的可能 。 然而,早期胃癌没有区别于普通胃炎的特异性症状,我国胃癌早期发现率长期徘徊在20%-30%之间。 现在医学界主流的胃癌早筛方法是"问卷+胃镜" ,也就是先填标准化风险评估问卷,然后根据问卷结果,让筛选出的高/中危人群做胃镜筛 查。 全球首个利用平扫CT识别早期胃癌的AI模型DAMO GRAPE 。 它首次突破了传统影像学的限制,实现了用非增强的普通CT识别胃癌的可能性。 实际操作中,DAMO GRAPE在全国20个中心近10万人的大规模临床研究证明 ...
机器人视觉语言导航进入R1时代!港大联合上海AI Lab提出全新具身智能框架
量子位· 2025-06-25 00:33
Core Insights - The article discusses the advancements in visual language navigation technology, specifically the VLN-R1 model developed by the University of Hong Kong and Shanghai AI Lab, which enables robots to navigate complex environments using natural language instructions without relying on discrete maps [1][3]. Group 1: Performance and Efficiency - VLN-R1 demonstrates strong performance in the VLN-CE benchmark, surpassing the results of larger models with only a 2 billion parameter model after RFT training [2]. - In long-distance navigation tasks, VLN-R1 showcases "cross-domain transfer," achieving superior performance with only 10,000 RxR samples after pre-training on R2R, highlighting its data efficiency [2][15]. Group 2: Innovation in Navigation - The core challenge of visual language navigation (VLN) is to enable agents to autonomously complete navigation tasks based on natural language commands while integrating real-time visual perception [3]. - Traditional navigation systems rely on discrete topological maps, limiting their adaptability to complex environments and dynamic changes [4][5]. Group 3: Training Mechanisms - VLN-R1 employs a two-stage training approach combining supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) to enhance decision-making capabilities [7]. - The model utilizes a group comparison optimization (GRPO) method to generate multiple action plans for the same instruction, optimizing strategies based on relative performance [7]. - A time decay reward (TDR) mechanism is introduced to prioritize immediate actions, ensuring the model focuses on current obstacles before planning future steps [8][9]. Group 4: Data Set and Memory Management - The VLN-Ego dataset, created using the Habitat simulator, includes 630,000 R2R and 1.2 million RxR training samples, emphasizing first-person perspectives and real-time decision-making [12]. - A long-short term memory sampling strategy is implemented to balance recent experiences with long-term memory, allowing the model to respond effectively to sudden changes in the environment [14]. Group 5: Future Implications - The research indicates that the key to embodied intelligence lies in creating a closed-loop learning system that mimics human perception, decision-making, and action [16]. - The framework's reproducibility and scalability are enhanced with the open availability of the VLN-Ego dataset and training methods, promoting the transition of AI from "digital intelligence" to "embodied cognition" across various applications [16].
谷歌发布本地具身智能模型!全程无联网执行精细操作,从人形机器人到工业机器人全覆盖
量子位· 2025-06-25 00:33
Core Viewpoint - Google DeepMind has launched the Gemini Robotics On-Device model, which allows robots to operate with a local "offline brain," enhancing their capabilities and reducing reliance on cloud computing [2][4]. Group 1: Model Capabilities - The Gemini Robotics On-Device model can run offline while maintaining strong operational capabilities, allowing robots to follow instructions and perform tasks requiring precision [3][4]. - This model supports deployment on various robotic platforms, from humanoid robots to industrial dual-arm robots, with significantly reduced response latency [4]. - The On-Device version outperforms previous local models in handling out-of-distribution tasks and complex multi-step instructions [8]. Group 2: Adaptability and Training - The model's adaptability is a key strength, enabling it to quickly adjust to new tasks with minimal training data, requiring only 50 to 100 demonstration samples for new tasks [11][12]. - The model has been successfully transferred to different robotic platforms, demonstrating its versatility in executing both general instruction-following tasks and industrial-level operations [13][14]. Group 3: Developer Support - Google has released the Gemini Robotics SDK to facilitate developer access to this technology, allowing for easy evaluation of the model's performance in various tasks and environments [15]. - The SDK includes a MuJoCo physics simulator for testing ideas in a simulated environment before deploying them on real robots, thereby reducing development costs and risks [15][16].