基础模型

Search documents
2025云栖大会在杭州开幕 数千科技产品集中亮相
Zhong Guo Xin Wen Wang· 2025-09-25 01:17
t 30 组 1 1925 華 寶 領域視觉模型及政府 t 通义官 es 11:2 chinanews.com.cn 2025 云栖大会 chinanews.com.cn 球领先的基础 World's Leading Foundation Model Fam P chinanews.com.cn ...... 111111 1111 12222 our and - 8 C E y chinanews.com.cn 241 chinanews.com.cn chinanews.com.cn 为不同工业务服打 1+N+S = 00 24 HEI chinanews.com.cn r in 式 视 影院院 Inteller The 胜年 5 略中原 11 打組 I ● ehinanews.com.cn ll Park 图 线 游 电 Ed chinanews.com.cn A 0 MITTING THE COLLECTION CONTRACTOR COLLECTION OF THE CONTRACT THE CONTRACT THE CONTRACT THE CONTRACT THE CONTRACT THE CO ...
自动驾驶基础模型应该以能力为导向,而不仅是局限于方法本身
自动驾驶之心· 2025-09-16 23:33
Core Insights - The article discusses the transformative impact of foundational models on the autonomous driving perception domain, shifting from task-specific deep learning models to versatile architectures trained on vast and diverse datasets [2][4] - It introduces a new classification framework focusing on four core capabilities essential for robust performance in dynamic driving environments: general knowledge, spatial understanding, multi-sensor robustness, and temporal reasoning [2][5] Group 1: Introduction and Background - Autonomous driving perception is crucial for enabling vehicles to interpret their surroundings in real-time, involving key tasks such as object detection, semantic segmentation, and tracking [3] - Traditional models, designed for specific tasks, exhibit limited scalability and poor generalization, particularly in "long-tail scenarios" where rare but critical events occur [3][4] Group 2: Foundational Models - Foundational models, developed through self-supervised or unsupervised learning strategies, leverage large-scale datasets to learn general representations applicable across various downstream tasks [4][5] - These models demonstrate significant advantages in autonomous driving due to their inherent generalization capabilities, efficient transfer learning, and reduced reliance on labeled datasets [4][5] Group 3: Key Capabilities - The four key dimensions for designing foundational models tailored for autonomous driving perception are: 1. General Knowledge: Ability to adapt to a wide range of driving scenarios, including rare situations [5][6] 2. Spatial Understanding: Deep comprehension of 3D spatial structures and relationships [5][6] 3. Multi-Sensor Robustness: Maintaining high performance under varying environmental conditions and sensor failures [5][6] 4. Temporal Reasoning: Capturing temporal dependencies and predicting future states of the environment [6] Group 4: Integration and Challenges - The article outlines three mechanisms for integrating foundational models into autonomous driving technology stacks: feature-level distillation, pseudo-label supervision, and direct integration [37][40] - It highlights the challenges faced in deploying these models, including the need for effective domain adaptation, addressing hallucination risks, and ensuring efficiency in real-time applications [58][61] Group 5: Future Directions - The article emphasizes the importance of advancing research in foundational models to enhance their safety and effectiveness in autonomous driving systems, addressing current limitations and exploring new methodologies [2][5][58]
Nature Medicine:盛斌/黄天荫团队开发眼科AI大模型,显著提升眼科医生诊疗水平和患者预后
生物世界· 2025-09-01 08:30
Core Viewpoint - The article emphasizes the significant advancement of Foundation Models (FM) in the potential applications of artificial intelligence (AI) in clinical care, highlighting the need for rigorous prospective validation and randomized controlled trials to bridge the gap between AI capabilities and real-world clinical environments [2][3][6]. Group 1: Foundation Model Development - A multi-modal visual-language ophthalmic foundation model named EyeFM was developed, which was validated through a prospective deployment across various global regions, including Asia, North America, Europe, and Africa [3][6]. - EyeFM was pre-trained using a diverse dataset of 14.5 million eye images, enabling it to perform various core clinical tasks effectively [6][11]. Group 2: Clinical Evaluation and Effectiveness - The effectiveness of EyeFM as a clinical assistance tool was evaluated through a randomized controlled trial involving 668 participants, showing a higher correct diagnosis rate of 92.2% compared to 75.4% in the control group [11][13]. - The study also indicated improved referral rates (92.2% vs 80.5%) and better self-management adherence (70.1% vs 49.1%) among the intervention group using EyeFM [11][13]. Group 3: Application and Future Implications - EyeFM serves as a comprehensive assistance system for ophthalmology, with potential applications across various clinical scenarios, enhancing the diagnostic capabilities of ophthalmologists and improving patient outcomes [12][13].
FDA已批准超1200款AI医疗器械:影像学之外,新的扩张专科在哪里?
思宇MedTech· 2025-08-21 03:50
Core Viewpoint - Artificial Intelligence (AI) is rapidly penetrating the medical device field, with over 1,200 AI/ML medical devices approved by the FDA as of July 2025, including a record 235 devices approved in 2024, indicating that AI is becoming a significant part of clinical practice [2][4]. Group 1: AI in Medical Imaging - Radiology remains the dominant application area for AI, focusing on tasks such as automatic image segmentation, lesion detection, and risk screening [4]. - The cardiovascular specialty is experiencing accelerated adoption of AI, expanding from ECG rhythm analysis to cardiac ultrasound and CT coronary imaging due to the high prevalence of cardiovascular diseases and the suitability of imaging data for AI training [5][6]. Group 2: AI in Neurology - In neurology, AI's initial entry point is acute stroke image recognition, with applications including arrhythmia detection and heart failure risk prediction [7][8]. - AI systems can automatically interpret CT/MRI scans within minutes, identifying potential ischemic or hemorrhagic lesions and notifying neurologists, thus shortening the "golden hour" for treatment [9]. - Neurology is emerging as a new growth area for FDA approvals due to high-risk, high-value disease scenarios, such as the urgent need for stroke decision-making and unmet needs in epilepsy and dementia [10]. Group 3: Emerging Specialties - Other specialties, including endoscopy and pathology, are also seeing rapid growth in AI medical devices, with applications in automatic identification of polyps and early tumors during gastrointestinal examinations [12]. - AI is enhancing efficiency in pathology by automating the identification and classification of digital pathology slides, allowing pathologists to quickly locate suspicious areas [12]. Group 4: Regulatory Challenges - As the number of FDA-approved AI medical devices surpasses 1,200, regulatory challenges are emerging, particularly in keeping pace with technological advancements [11]. - The focus of FDA regulation is shifting from merely approving the number of AI devices to balancing innovation with safety, necessitating a reevaluation of regulatory frameworks as AI evolves from a "tool" to a "partner" in healthcare [11][14].
百度高管解读Q2财报:正在开发Ernie的下一代旗舰版本
Xin Lang Ke Ji· 2025-08-20 14:04
Core Viewpoint - Baidu reported a total revenue of 32.7 billion yuan for Q2 2025, a year-on-year decline of 4%, while net profit attributable to Baidu was 7.3 billion yuan, up from 5.5 billion yuan in the same period last year [1] Financial Performance - Total revenue for Q2 2025 was 32.7 billion yuan, down 4% year-on-year [1] - Net profit attributable to Baidu was 7.3 billion yuan, compared to 5.5 billion yuan in the same period last year [1] - Non-GAAP net profit attributable to Baidu was 4.8 billion yuan, down from 7.4 billion yuan in the same period last year [1] AI Model Strategy - The speed of AI model iteration is unprecedented, with new models being released almost weekly, each stronger than the last [2] - The industry is seeing a diversification of foundational models, with different models excelling in various tasks, leading to a coexistence of multiple models at reasonable prices [3] - Baidu's Ernie model is positioned to focus on application-driven innovation, concentrating on strategic areas that add value to the company [3] Future Developments - Baidu is developing the next generation of the Ernie model, which will feature significant improvements in key functionalities [4] - The company plans to accelerate the development of its large models and will continue to iterate and upgrade existing models [4] - Baidu is focusing on enhancing AI search capabilities and generating multimodal search results, which are well-received by both regular and cloud users [4]
BIDU(BIDU) - 2025 Q2 - Earnings Call Transcript
2025-08-20 13:00
Financial Data and Key Metrics Changes - Total revenue for the company was RMB22.7 billion, a decrease of 4% year over year [32] - Revenue from Baidu Core was RMB26.3 billion, down 2% year over year [32] - Baidu Core's online marketing revenue decreased by 15% year over year to RMB16.2 billion [33] - Non-online marketing revenue for Baidu Core increased by 34% year over year, reaching RMB10 billion [33] - AI cloud revenue grew 27% year over year to RMB6.5 billion [21][33] - Operating income was RMB3.3 billion, with an operating margin of 13% [35] - Net income attributed to Baidu was RMB7.3 billion, with a diluted earnings per ADS of RMB20.35 [37] Business Line Data and Key Metrics Changes - AI cloud business showed strong growth, with revenue increasing by 27% year over year [21][33] - Digital human technology revenue increased by 55% quarter over quarter, contributing 3% of Baidu Core's online marketing revenue [30] - Revenue generated by agents for advertisers grew 50% year over year, contributing 13% of Baidu Core's online marketing revenue [29] Market Data and Key Metrics Changes - Baidu App's monthly active users reached 735 million, representing a 5% year over year growth [20] - Daily average time spent per user in Q2 increased by 4% year over year [20] Company Strategy and Development Direction - The company is focusing on AI transformation, particularly in Baidu Search, to enhance user experience and drive long-term value [19][28] - The strategy includes a shift from traditional search results to AI-generated, multimodal content [55] - The company is committed to investing in AI and has made substantial investments in AI transformation, particularly in search [81] Management's Comments on Operating Environment and Future Outlook - Management acknowledged near-term headwinds in advertising revenue but expressed confidence in the long-term potential of AI search monetization [81] - The company is optimistic about the growth of its AI cloud services, driven by increasing demand across various sectors [76] - Management indicated that while revenue and margins may face pressure in the near term, there is potential for recovery and improvement in profitability over time [83] Other Important Information - The company has established partnerships with Uber and Lyft to expand its autonomous driving services globally [14][92] - Apollo Go provided over 2.2 million fully driverless rides in Q2, marking a 148% year-over-year increase [13][26] Q&A Session Summary Question: How does the company view the current landscape of AI models and the positioning of Ernie? - Management noted that the pace of model iteration is faster than ever, with a diverse landscape where different models excel at various tasks [44][45] - Ernie is positioned as an application-driven model focused on generating multimodal search results and enhancing user engagement [46][47] Question: What updates can be shared regarding AI search monetization testing? - Management indicated that AI search transformation is progressing rapidly, with higher user engagement and retention metrics [54][55] - The end game for AI search involves delivering intelligent, personalized responses and connecting users with real-world services [58] Question: Can management provide a breakdown of AI cloud revenue and margin profile? - AI cloud revenue grew 27% year over year, with subscription-based revenue accounting for more than half of the total [62] - The company aims to reduce project-based revenue for greater stability and improve profitability over the long term [63] Question: What are the plans for cost optimization and margin trends? - Management is focused on internal efficiency gains while continuing to invest in AI [81] - There is an expectation for revenue and margins to remain under pressure in the near term, but potential for recovery exists as the core advertising business stabilizes [83] Question: How does the company assess its long-term differentiation in the autonomous driving landscape? - Management emphasized the company's leadership in both left-hand and right-hand drive markets, with a focus on operational excellence and cost efficiency [86][90] - Partnerships with global mobility platforms are seen as a strategy to accelerate market entry and scale operations [92]
TUM最新!全面梳理自动驾驶基础模型:LLM/VLM/MLLM/扩散模型和世界模型一网打尽~
自动驾驶之心· 2025-07-29 00:52
Core Insights - The article presents a comprehensive review of the latest advancements in autonomous driving, focusing on the application of foundation models (FMs) such as LLMs, VLMs, MLLMs, diffusion models, and world models in scene generation and analysis [2][20][29] - It emphasizes the importance of simulating diverse and rare driving scenarios for the safety and performance validation of autonomous driving systems, highlighting the limitations of traditional scene generation methods [2][8][9] - The review identifies open research challenges and future directions for enhancing the adaptability, robustness, and evaluation capabilities of foundation model-driven approaches in autonomous driving [29][30] Group 1: Foundation Models in Autonomous Driving - Foundation models represent a new generation of pre-trained AI models capable of processing heterogeneous inputs, enabling the synthesis and interpretation of complex driving scenarios [2][9][10] - The emergence of foundation models has provided new opportunities to enhance the realism, diversity, and scalability of scene testing in autonomous driving [9][10] - The review categorizes the applications of LLMs, VLMs, MLLMs, diffusion models, and world models in scene generation and analysis, providing a structured classification system [29] Group 2: Scene Generation and Analysis - Scene generation in autonomous driving encompasses various formats, including annotated sensor data, multi-camera video streams, and simulated urban environments [21] - The article discusses the limitations of existing literature on scene generation, noting that many reviews focus on classical methods without adequately addressing the role of foundation models [23][24][25] - Scene analysis involves systematic evaluation tasks such as risk assessment and anomaly detection, which are crucial for ensuring the safety and robustness of autonomous systems [25][28] Group 3: Research Contributions and Future Directions - The review provides a structured classification of existing methods, datasets, simulation platforms, and benchmark competitions related to scene generation and analysis in autonomous driving [29] - It identifies key open research challenges, including the need for better integration of foundation models in scene generation and analysis tasks, and proposes future research directions to address these challenges [29][30] - The article highlights the necessity for efficient prompting techniques and lightweight model architectures to reduce inference latency and resource consumption in real-world applications [36][37]
硬核「吵」了30分钟:这场大模型圆桌,把AI行业的分歧说透了
机器之心· 2025-07-28 04:24
Core Viewpoint - The article discusses a heated debate among industry leaders at the WAIC 2025 forum regarding the evolution of large model technologies, focusing on training paradigms, model architectures, and data sources, highlighting a significant shift from pre-training to reinforcement learning as a dominant approach in AI development [2][10][68]. Group 1: Training Paradigms - The forum highlighted a paradigm shift in AI from a pre-training dominant model to one that emphasizes reinforcement learning, marking a significant evolution in AI technology [10][19]. - OpenAI's transition from pre-training to reinforcement learning is seen as a critical development, with experts suggesting that the pre-training era is nearing its end [19][20]. - The balance between pre-training and reinforcement learning is a key topic, with experts discussing the importance of pre-training in establishing a strong foundation for reinforcement learning [25][26]. Group 2: Model Architectures - The dominance of the Transformer architecture in AI has been evident since 2017, but its limitations are becoming apparent as model parameters increase and context windows expand [31][32]. - There are two main exploration paths in model architecture: optimizing existing Transformer architectures and developing entirely new paradigms, such as Mamba and RetNet, which aim to improve efficiency and performance [33][34]. - The future of model architecture may involve a return to RNN structures as the industry shifts towards agent-based applications that require models to interact autonomously with their environments [38]. Group 3: Data Sources - The article discusses the looming challenge of high-quality data scarcity, predicting that by 2028, existing data reserves may be fully utilized, potentially stalling the development of large models [41][42]. - Synthetic data is being explored as a solution to data scarcity, with companies like Anthropic and OpenAI utilizing model-generated data to supplement training [43][44]. - Concerns about the reliability of synthetic data are raised, emphasizing the need for validation mechanisms to ensure the quality of training data [45][50]. Group 4: Open Source vs. Closed Source - The ongoing debate between open-source and closed-source models is highlighted, with open-source models like DeepSeek gaining traction and challenging the dominance of closed-source models [60][61]. - Open-source initiatives are seen as a way to promote resource allocation efficiency and drive industry evolution, even if they do not always produce the highest-performing models [63][64]. - The future may see a hybrid model combining open-source and closed-source approaches, addressing challenges such as model fragmentation and misuse [66][67].
启明创投于WAIC 2025再发AI十大展望:围绕基础模型、AI应用、具身智能等
IPO早知道· 2025-07-28 03:47
Core Viewpoint - Qiming Venture Partners is recognized as one of the earliest and most comprehensive investment institutions in the AI sector in China, having invested in over 100 AI projects, covering the entire AI industry chain and promoting the rise of several benchmark enterprises in the field [2]. Group 1: AI Models - In the next 12-24 months, a context window of 2 million tokens will become standard for top AI models, with more refined and intelligent context engineering driving the development of AI models and applications [4]. - A universal video model is expected to emerge within 12-24 months, capable of handling generation, reasoning, and task understanding in video modalities, thus innovating video content generation and interaction [6]. Group 2: AI Agents - In the next 12-24 months, the form of AI agents will transition from "tool assistance" to "task undertaking," with the first true "AI employees" entering enterprises, participating widely in core processes such as customer service, sales, operations, and R&D, thus shifting from cost tools to value creation [8]. - Multi-modal agents will increasingly become practical, integrating visual, auditory, and sensor inputs to perform complex reasoning, tool invocation, and task execution, achieving breakthroughs in industries such as healthcare, finance, and law [9]. Group 3: AI Infrastructure - In the AI chip sector, more "nationally established" and "nationally produced" GPUs will begin mass delivery, while innovative new-generation AI cloud chips focusing on 3D DRAM stacking and integrated computing will emerge in the market [11]. - In the next 12-24 months, token consumption will increase by 1 to 2 orders of magnitude, with cluster inference optimization, terminal inference optimization, and soft-hard collaborative inference optimization becoming core technologies for reducing token costs on the AI infrastructure side [12]. Group 4: AI Applications - The paradigm shift in AI interaction will accelerate in the next two years, driven by a decrease in user reliance on mobile screens and the rising importance of natural interaction methods like voice, leading to the birth of AI-native super applications [14]. - The potential for AI applications in vertical scenarios is immense, with more startups leveraging industry insights to deeply engage in niche areas and rapidly achieve product-market fit, adopting a "Go Narrow and Deep" strategy to differentiate from larger companies [15]. - The AI BPO (Business Process Outsourcing) model is expected to achieve commercial breakthroughs in the next 12-24 months, transitioning from "delivery tools" to "delivery results," and expanding rapidly in standardized industries such as finance, customer service, marketing, and e-commerce through a "pay-per-result" approach [15]. Group 5: Embodied Intelligence - Embodied intelligent robots will first achieve large-scale deployment in scenarios such as picking, transporting, and assembling, accumulating a wealth of first-person perspective data and tactile operation data, thereby constructing a closed-loop flywheel of "model - ontology - scene data," which will drive model capability iteration and ultimately promote the large-scale landing of general-purpose robots [17].
月之暗面Kimi发布MoE架构基础模型K2并同步开源,总参数1T
news flash· 2025-07-11 15:00
Core Insights - The company "月之暗面Kimi" has released the MoE architecture foundational model K2, which features a total of 1 trillion parameters and 32 billion active parameters, surpassing other global open-source models in areas such as autonomous programming, tool utilization, and mathematical reasoning [1] Group 1 - K2 utilizes the MuonClip optimizer to achieve efficient training of trillion-parameter models [1] - The model enhances token efficiency to find new pre-training expansion space amid bottlenecks in high-quality data [1] - K2 demonstrates stronger coding capabilities and excels in general agent tasks, showcasing improved capability generalization and practicality across multiple real-world scenarios [1] Group 2 - The new model is currently available for open experience [1]