Workflow
o3推理模型
icon
Search documents
OpenAI未公开的o3「用图思考」技术,被小红书、西安交大尝试实现了
机器之心· 2025-05-31 06:30
Core Viewpoint - OpenAI's o3 reasoning model has broken traditional boundaries of text-based thinking by integrating images directly into the reasoning process, achieving a new level of multimodal reasoning capabilities [1][4][29] Group 1: Model Capabilities - The o3 model can analyze images and derive answers by focusing on relevant areas, such as formulas in a physics exam or structural elements in architectural drawings, achieving a 95.7% accuracy on the V* Bench visual reasoning benchmark [1] - DeepEyes, developed by a collaboration between Xiaohongshu and Xi'an Jiaotong University, has demonstrated similar capabilities to o3, allowing for reasoning with images without relying on supervised fine-tuning [1][29] Group 2: Reasoning Process - DeepEyes employs a three-step reasoning process: global visual analysis, intelligent tool invocation, and detail reasoning identification, showcasing its ability to think with images [7][10] - The model's architecture introduces a "self-driven visual focus" mechanism, allowing it to dynamically determine when to utilize image information based on the reasoning context [14] Group 3: Learning Mechanism - DeepEyes utilizes an outcome-based reinforcement learning strategy, inspired by biological evolution, to develop its image reasoning capabilities without the need for supervised fine-tuning [18][19] - The learning process is divided into three stages: a novice phase with low accuracy, an exploration phase with increased tool usage, and a mature phase where the model effectively predicts key areas for analysis [21] Group 4: Performance Metrics - DeepEyes has shown superior performance in various visual reasoning tasks, achieving a 90.1% accuracy on the V* Bench and outperforming existing workflow-based methods [23] - The model also exhibits enhanced mathematical reasoning capabilities, indicating its potential for cross-task performance [24] Group 5: Advantages of DeepEyes - Compared to traditional models, DeepEyes offers a simpler training process, stronger generalization capabilities, end-to-end joint optimization, deeper multimodal integration, and inherent tool invocation abilities [26][28][29]
硅谷大厂暂缓数据中心建设,算力叙事要讲不下去了
3 6 Ke· 2025-04-27 06:34
Core Viewpoint - The article discusses the recent cautious stance of major tech companies, particularly Amazon and Microsoft, regarding the expansion of AI data centers, indicating a potential slowdown in the AI industry and a reevaluation of the demand for computing power [1][2][3]. Group 1: Company Actions - Alibaba's chairman expressed concerns about a bubble in the U.S. data center market, which has led to a negative impact on domestic AI computing power stocks [1]. - Amazon Web Services (AWS) has reportedly paused some leasing negotiations for data centers, particularly international ones, mirroring Microsoft's recent actions [1][2]. - Microsoft confirmed the suspension of a $1 billion investment plan for three data centers in Ohio, suggesting a broader trend of scaling back on new projects in the AI sector [1][2]. Group 2: Industry Trends - The collaboration between AWS and AI startup Anthropic is highlighted as a strong partnership, contrasting with Microsoft's relationship with OpenAI, which appears less stable [2]. - The emergence of open-source models, particularly DeepSeek, has led to a reassessment of the value of foundational large models, causing many AI startups to reconsider their strategies [2][3]. - The overall demand for data center computing power is expected to decline as fewer companies are developing AI models, leading to a lack of customers for data center leasing [3]. Group 3: AI Model Development - The pace of AI model advancements has reportedly slowed, with some entrepreneurs expressing disappointment over the lack of significant progress since August of the previous year [3][5]. - There is a discrepancy between AI model performance scores and user experience, with notable examples like Meta's Llama 4 and OpenAI's o3 model failing to meet user expectations despite high scores in competitive settings [3][4]. - The AI industry is experiencing a cycle similar to that of the smartphone industry, where the focus on performance metrics has overshadowed actual user experience [4][5]. Group 4: Market Sentiment - The article suggests that the current state of the AI industry reflects a broader disillusionment, as the anticipated killer applications have yet to materialize, leading to a lack of sustainable revenue-generating products [4][5]. - The rapid hiring and subsequent layoffs in Silicon Valley during the pandemic are cited as a cautionary tale, with companies now facing the consequences of overexpansion during a period of perceived growth [5][6]. - The optimism surrounding the internet industry's growth is contrasted with a more cautious outlook for AI, indicating that companies may struggle to maintain the same level of enthusiasm moving forward [6].
OpenAI官宣GPT-4本月底退役 由4o完全替代
news flash· 2025-04-12 13:48
Core Insights - OpenAI announced that GPT-4 will be completely replaced by GPT-4o starting April 30, while GPT-4 will still be available through API [1] - GPT-4o has shown superior performance in writing, coding, and STEM tasks compared to GPT-4 during face-to-face evaluations [1] - A series of new AI models, including GPT-4.1, will be unveiled next week, which will be an improved multimodal version of GPT-4o [1] Model Developments - OpenAI will introduce smaller versions of the new model, specifically GPT-4.1 mini and nano [1] - Additionally, new reasoning models named o3 and o4-mini will be launched [1]