基础模型
Search documents
IJRR北邮首篇,联合三星中国研究院、清华大学等共同探讨“机器人操作大模型”
机器人大讲堂· 2025-11-24 08:31
实现电影"I,Robot"中的通用机器人是机器人研究学者一直追求的目标。然而,在非结构化场景中实现机器人 的通用操作仍然是有挑战的。基于学习的方法被认为是实现通用操作的有效路径,但是仍然存在1) 和人类非 自然交互 2) 数据稀缺 3)有限的感知能力 4)有限的决策能力 5)不准确的事前事后处理 6)不够鲁棒的策 略 7)环境转移性差等挑战。 近 期北京邮电大学方斌教授团队联合三星中国研究院、清华大学孙富春教授、刘华平教授以及德国汉堡大学 张建伟院士等发表在International Journal of Robotics Research的文章"What Foundation Models can Bring for Robot Learning in Manipulation : A Survey",探讨了基础模型如何赋能机器人智能操作。 https://journals.sagepub.com/eprint/NHMPYHAYJ6SUVQYSUWZI/full 基础模型的出现点燃了研究学者们解决上述问题的希望: 1)LLMs能够直接生成策略代码或动作序列,并促进 机器人与环境的自然交互。2)VFMs增强了 ...
中外专家共探AI技术前沿与产业赋能
Xin Lang Cai Jing· 2025-11-21 07:23
"研究表明,基础模型能力提升所消耗的token数量呈指数增长。"姜大昕表示,耗电量是电力时代衡量 一个国家经济运行情况的指标之一,在AI时代,此指标或许将成为token消耗量。"模型的推理效率越 高,产生token的成本越低。需要我们通过产业上下游联合优化、协同设计模型芯片,并推动系统与架 构联合创新,实现模型推理效率的提升。" 除了基础模型的技术演进,智能系统在实际场景中的应用也成为专家关注的焦点。加拿大阿尔伯塔大学 教授奥斯马尔·扎伊安(Osmar Zaïane)指出,随着智能系统越来越多地在动态、不可预测的环境中运行, 活动现 场。 鲍梦妮 摄 中新网杭州11月20日电(鲍梦妮)11月20日,由浙江之江实验室与《科学》/美国科学促进会 (Science/AAAS)共同举办的第五届智能计算创新论坛在浙江杭州举行。 奥斯马尔表示,如今中国在智能制造等领域展现出其领先的实力,这对他的研究带来启示。"在中国, 我们能迅速观察到不同智能体之间的协作、人与智能体之间的协作等取得的实际成效。这里是新技术应 用的绝佳试验场。" 能够有效响应变化的机器人变得愈发重要。 据悉,智能计算创新论坛已成功举办四届,搭建起智能计算 ...
刘德兵说上限,刘知远讲拐点:中国AI十年剧本被他们提前揭开了
3 6 Ke· 2025-11-20 09:57
他把当前在未来十年的阶段性,形容为"即将进入到人工智能革命高潮的前夜"。 在中关村举办的2025人工智能+大会,中国AI未来十年的关键"进度条"正在变得清晰。 大会间隙,人工智能百人会高级顾问——智谱董事长刘德兵与面壁智能联合创始人兼首席科学家、清华大学副教授刘知远接受了智东西的独家 采访。两位长期深耕一线的实践者,从基础模型到智能体演进,分享了他们对未来十年的观察与思考。 在谈到基础模型竞争时,刘德兵并不回避现实:在开源成为主流、结果可公开验证的当下,模型能力的差距会被迅速放大——"在一线开源模 型做到90分的情况下,再训一个85分的模型就没多少竞争力。" 他同时强调,坚持做难而正确的事情很重要,哪怕投入巨大,因为"基础模型决定了整个AI产业发展的上限"。他认为,未来的关键变量将更 多来自开源生态的成熟、行业场景的深度落地,以及AI逐渐成为"全民能力"所带来的广泛参与。 在刘知远看来,2025年的一个显著拐点是"AI+编程",这一能力正在成为软件生产力的重要支撑。 对于大模型如何迈向智能体,他强调的不是堆叠更多知识,而是让模型具备"在指定工作岗位上自主学习的成长能力",像大学毕业生一样,通 过真实任务的反馈 ...
中泰证券:Gemini 3 Pro能力全方位跃升 开创Agent平台新格局
Zhi Tong Cai Jing· 2025-11-20 08:01
Core Insights - The release of Gemini 3 by Google demonstrates significant advancements in AI model capabilities, indicating that the progress in model intelligence has not yet reached its ceiling [1][2] - The report suggests focusing on companies with strong fundamentals in the foundational computing layer, model layer, and B-end vendors that deeply integrate services into business processes [1] Investment Events - Google officially launched the Gemini 3 series, including the Gemini 3 Pro model, on November 18, 2025, achieving state-of-the-art (SOTA) performance across multiple evaluation dimensions [1] Performance Metrics - Gemini 3 Pro scored 37.5% in the Humanity's Last Exam, surpassing GPT-5.1 (26.5%) and Claude Sonnet 4.5 (13.7%), showcasing doctoral-level reasoning capabilities [2] - In the MathArena Apex test, Gemini 3 Pro achieved a score of 23.4%, significantly outperforming GPT-5.1 (1.0%) and Claude Sonnet 4.5 (1.6%), indicating a leap in deep reasoning abilities [2] Multi-Modal Architecture and User Interface - Gemini 3 Pro continues the original multi-modal architecture and introduces a Generative User Interface (Generative UI) that allows for customized interactive responses based on user prompts [3] - Google launched the Antigravity platform for AI agent development, enabling developers to utilize models like Gemini 3 Pro and Claude Sonnet 4.5 for free, enhancing programming efficiency through autonomous task execution [3] Search Enhancements - Google has upgraded its search capabilities with Gemini 3, improving query fan-out technology to enhance search efficiency and user experience through interactive tools and dynamic visual presentations [4] Ecosystem Trends - The report highlights a trend of major foundational model companies building comprehensive ecosystems, with firms like OpenAI, Anthropic, and Google transitioning from model providers to platform developers [5] - In coding scenarios, tools like Antigravity and Anthropic's Claude Code are being integrated into foundational models, blurring the lines between standalone SaaS products and model modules [5]
OmniDexGrasp 揭秘:基础模型 + 力反馈,让机器人 “看懂指令、灵活抓握” 的通用方案
具身智能之心· 2025-10-31 00:04
Core Insights - The article discusses the OmniDexGrasp framework, which addresses the challenges of dexterous grasping in robotics by combining foundation models with force feedback control to achieve generalizable and physically feasible grasping [1][2][21]. Group 1: Challenges in Dexterous Grasping - Current dexterous grasping solutions face a dilemma between data-driven approaches, which struggle with generalization due to limited datasets, and foundation models, which often fail to translate abstract knowledge into physical actions [2]. - The core issue is the inability to balance generalization and physical feasibility, leading to failures in grasping new objects or in complex scenarios [2]. Group 2: OmniDexGrasp Framework - OmniDexGrasp employs a three-stage approach: generating human grasping images, action transfer to robots, and force feedback control, effectively bridging the gap between abstract knowledge and physical execution [4][21]. - The framework retains the generalization capabilities of foundation models while ensuring physical feasibility through precise action transformation and control strategies [4]. Group 3: Key Modules of OmniDexGrasp - **Module 1**: Generates human grasping images to help robots understand how to grasp objects, utilizing a variety of input designs to accommodate different user needs [6][8]. - **Module 2**: Translates human grasping images into robot actions, addressing the challenge of aligning human intent with robotic capabilities through a three-step transfer strategy [9][12]. - **Module 3**: Implements force feedback control to ensure stable and safe grasping, adapting to the physical properties of objects and preventing damage during the grasping process [12][13]. Group 4: Experimental Results - OmniDexGrasp demonstrated an average success rate of 87.9% across six core grasping tasks, significantly outperforming traditional methods [15]. - In comparative tests, OmniDexGrasp showed superior generalization, especially with new objects, achieving success rates that far exceeded those of existing solutions [16][18]. Group 5: Future Directions - The framework suggests future enhancements through multi-modal observation integration and deeper control task development, aiming for end-to-end general manipulation capabilities [22]. - The potential for OmniDexGrasp to extend beyond grasping to broader manipulation tasks is highlighted, indicating its versatility in robotic applications [20].
实锤了:GPU越多,论文接收率越高、引用越多
机器之心· 2025-10-17 08:12
Core Insights - The article discusses the significant advancements in the AI field over the past three years, primarily driven by the development of foundational models, which require substantial data, computational power, and human resources [2][4]. Resource Allocation and Research Impact - The relationship between hardware resources and the publication of top-tier AI/ML conference papers has been analyzed, focusing on GPU availability and TFLOPs [4][5]. - A total of 5,889 foundational model-related papers were identified, revealing that stronger GPU acquisition capabilities correlate with higher acceptance rates and citation counts in eight leading conferences [5][9]. Research Methodology - The study collected structured information from 34,828 accepted papers between 2022 and 2024, identifying 5,889 related to foundational models through keyword searches [8][11]. - A survey of 229 authors from 312 papers indicated a lack of transparency in GPU usage reporting, highlighting the need for standardized resource disclosure [9][11]. Growth of Foundational Model Research - From 2022 to 2024, foundational model research has seen explosive growth, with the proportion of related papers in top AI conferences rising significantly [18][19]. - In NLP conferences, foundational model papers have outpaced those in general machine learning conferences [22]. Research Contributions by Academia and Industry - Academic institutions contributed more papers overall, while top industrial labs excelled in single-institution output, with Google and Microsoft leading in paper production [29][32]. - The research efficiency between academia and industry is comparable, with industry researchers publishing an average of 8.72 papers and academia 7.93 papers [31]. Open Source Models and GPU Usage - Open-source models, particularly the LLaMA series, have become the predominant choice in research, favored for their flexibility and accessibility [35][37]. - NVIDIA A100 is the most widely used GPU in foundational model research, with a notable concentration of GPU resources among a few institutions [38][39]. Funding Sources and Research Focus - Government funding is the primary source for foundational model research, with 85.5% of papers receiving government support [41][42]. - The focus of research has shifted towards algorithm development and inference processes, with a significant portion of papers dedicated to these areas [42]. Computational Resources and Research Output - The total computational power measured in TFLOPs is more strongly correlated with research output and citation impact than the sheer number of GPUs used [44][45]. - While more resources can improve acceptance rates, the quality of research and its novelty remain critical factors in the review process [47].
2025云栖大会在杭州开幕 数千科技产品集中亮相
Zhong Guo Xin Wen Wang· 2025-09-25 01:17
t 30 组 1 1925 華 寶 領域視觉模型及政府 t 通义官 es 11:2 chinanews.com.cn 2025 云栖大会 chinanews.com.cn 球领先的基础 World's Leading Foundation Model Fam P chinanews.com.cn ...... 111111 1111 12222 our and - 8 C E y chinanews.com.cn 241 chinanews.com.cn chinanews.com.cn 为不同工业务服打 1+N+S = 00 24 HEI chinanews.com.cn r in 式 视 影院院 Inteller The 胜年 5 略中原 11 打組 I ● ehinanews.com.cn ll Park 图 线 游 电 Ed chinanews.com.cn A 0 MITTING THE COLLECTION CONTRACTOR COLLECTION OF THE CONTRACT THE CONTRACT THE CONTRACT THE CONTRACT THE CONTRACT THE CO ...
自动驾驶基础模型应该以能力为导向,而不仅是局限于方法本身
自动驾驶之心· 2025-09-16 23:33
Core Insights - The article discusses the transformative impact of foundational models on the autonomous driving perception domain, shifting from task-specific deep learning models to versatile architectures trained on vast and diverse datasets [2][4] - It introduces a new classification framework focusing on four core capabilities essential for robust performance in dynamic driving environments: general knowledge, spatial understanding, multi-sensor robustness, and temporal reasoning [2][5] Group 1: Introduction and Background - Autonomous driving perception is crucial for enabling vehicles to interpret their surroundings in real-time, involving key tasks such as object detection, semantic segmentation, and tracking [3] - Traditional models, designed for specific tasks, exhibit limited scalability and poor generalization, particularly in "long-tail scenarios" where rare but critical events occur [3][4] Group 2: Foundational Models - Foundational models, developed through self-supervised or unsupervised learning strategies, leverage large-scale datasets to learn general representations applicable across various downstream tasks [4][5] - These models demonstrate significant advantages in autonomous driving due to their inherent generalization capabilities, efficient transfer learning, and reduced reliance on labeled datasets [4][5] Group 3: Key Capabilities - The four key dimensions for designing foundational models tailored for autonomous driving perception are: 1. General Knowledge: Ability to adapt to a wide range of driving scenarios, including rare situations [5][6] 2. Spatial Understanding: Deep comprehension of 3D spatial structures and relationships [5][6] 3. Multi-Sensor Robustness: Maintaining high performance under varying environmental conditions and sensor failures [5][6] 4. Temporal Reasoning: Capturing temporal dependencies and predicting future states of the environment [6] Group 4: Integration and Challenges - The article outlines three mechanisms for integrating foundational models into autonomous driving technology stacks: feature-level distillation, pseudo-label supervision, and direct integration [37][40] - It highlights the challenges faced in deploying these models, including the need for effective domain adaptation, addressing hallucination risks, and ensuring efficiency in real-time applications [58][61] Group 5: Future Directions - The article emphasizes the importance of advancing research in foundational models to enhance their safety and effectiveness in autonomous driving systems, addressing current limitations and exploring new methodologies [2][5][58]
Nature Medicine:盛斌/黄天荫团队开发眼科AI大模型,显著提升眼科医生诊疗水平和患者预后
生物世界· 2025-09-01 08:30
Core Viewpoint - The article emphasizes the significant advancement of Foundation Models (FM) in the potential applications of artificial intelligence (AI) in clinical care, highlighting the need for rigorous prospective validation and randomized controlled trials to bridge the gap between AI capabilities and real-world clinical environments [2][3][6]. Group 1: Foundation Model Development - A multi-modal visual-language ophthalmic foundation model named EyeFM was developed, which was validated through a prospective deployment across various global regions, including Asia, North America, Europe, and Africa [3][6]. - EyeFM was pre-trained using a diverse dataset of 14.5 million eye images, enabling it to perform various core clinical tasks effectively [6][11]. Group 2: Clinical Evaluation and Effectiveness - The effectiveness of EyeFM as a clinical assistance tool was evaluated through a randomized controlled trial involving 668 participants, showing a higher correct diagnosis rate of 92.2% compared to 75.4% in the control group [11][13]. - The study also indicated improved referral rates (92.2% vs 80.5%) and better self-management adherence (70.1% vs 49.1%) among the intervention group using EyeFM [11][13]. Group 3: Application and Future Implications - EyeFM serves as a comprehensive assistance system for ophthalmology, with potential applications across various clinical scenarios, enhancing the diagnostic capabilities of ophthalmologists and improving patient outcomes [12][13].
FDA已批准超1200款AI医疗器械:影像学之外,新的扩张专科在哪里?
思宇MedTech· 2025-08-21 03:50
Core Viewpoint - Artificial Intelligence (AI) is rapidly penetrating the medical device field, with over 1,200 AI/ML medical devices approved by the FDA as of July 2025, including a record 235 devices approved in 2024, indicating that AI is becoming a significant part of clinical practice [2][4]. Group 1: AI in Medical Imaging - Radiology remains the dominant application area for AI, focusing on tasks such as automatic image segmentation, lesion detection, and risk screening [4]. - The cardiovascular specialty is experiencing accelerated adoption of AI, expanding from ECG rhythm analysis to cardiac ultrasound and CT coronary imaging due to the high prevalence of cardiovascular diseases and the suitability of imaging data for AI training [5][6]. Group 2: AI in Neurology - In neurology, AI's initial entry point is acute stroke image recognition, with applications including arrhythmia detection and heart failure risk prediction [7][8]. - AI systems can automatically interpret CT/MRI scans within minutes, identifying potential ischemic or hemorrhagic lesions and notifying neurologists, thus shortening the "golden hour" for treatment [9]. - Neurology is emerging as a new growth area for FDA approvals due to high-risk, high-value disease scenarios, such as the urgent need for stroke decision-making and unmet needs in epilepsy and dementia [10]. Group 3: Emerging Specialties - Other specialties, including endoscopy and pathology, are also seeing rapid growth in AI medical devices, with applications in automatic identification of polyps and early tumors during gastrointestinal examinations [12]. - AI is enhancing efficiency in pathology by automating the identification and classification of digital pathology slides, allowing pathologists to quickly locate suspicious areas [12]. Group 4: Regulatory Challenges - As the number of FDA-approved AI medical devices surpasses 1,200, regulatory challenges are emerging, particularly in keeping pace with technological advancements [11]. - The focus of FDA regulation is shifting from merely approving the number of AI devices to balancing innovation with safety, necessitating a reevaluation of regulatory frameworks as AI evolves from a "tool" to a "partner" in healthcare [11][14].