SAM 3
Search documents
2亿美金留不住的华人天才,为何集体投奔OpenAI?
Xin Lang Cai Jing· 2026-02-27 10:11
Core Insights - The article discusses the recent trend of top talent, particularly Chinese researchers, leaving Meta for OpenAI, highlighting a shift in priorities from salary to platform capabilities in the AI industry [3][5][10]. Group 1: Talent Movement - Ruoming Pang, a prominent AI infrastructure leader at Meta, left the company after only 7 months to join OpenAI, despite a reported salary package exceeding $200 million [3][5]. - This trend is not isolated; other notable researchers, including Zhang Pengchuan, have also transitioned from Meta to OpenAI, indicating a broader pattern of talent migration [8][9][10]. Group 2: Reasons for Departure - The primary motivation for these researchers is not financial compensation but rather the superior computational resources and modeling infrastructure available at OpenAI [6][7]. - The article emphasizes that for high-caliber professionals like Pang, the ability to explore the frontiers of AI technology is more critical than salary alone [7][8]. Group 3: Industry Implications - The departure of top talent from Meta to OpenAI reflects a significant shift in the AI landscape, where infrastructure and system efficiency are becoming the new currency of value [11][15]. - The article suggests that the competition in AI is evolving from merely algorithmic prowess to a combination of theoretical and engineering expertise, as seen with the recruitment of scholars like Chen Lijie [11][12]. Group 4: Meta's Challenges - Meta's "Super Intelligence Lab" has become a talent pool for OpenAI, raising concerns about Meta's ability to retain top talent and produce competitive AI products [10][15]. - The article notes that despite significant investments, Meta has struggled to deliver groundbreaking products that can rival OpenAI's offerings, leading to a perception of stagnation within the company [10][15]. Group 5: Future Outlook - The ongoing talent redistribution indicates a recalibration of how top-tier AI professionals are valued, with a focus on those who can build and optimize foundational systems [11][15]. - The article concludes that the current environment in Silicon Valley resembles a high-stakes casino, with OpenAI currently holding the most advantageous position in the race towards Artificial General Intelligence (AGI) [15][16].
清华数学系大神跳槽OpenAI,曾主导SAM与Llama开发,Sora负责人:欢迎加入
3 6 Ke· 2026-02-25 12:23
Core Insights - Pengchuan Zhang, a prominent researcher from Tsinghua University, has joined OpenAI to focus on World Simulation and Robotics, indicating a strategic shift towards integrating visual perception and robotics technology [1][2][17] Group 1: Background of Pengchuan Zhang - Zhang graduated from Tsinghua University with a major in mathematics and later obtained a PhD in Applied and Computational Mathematics from Caltech in 2017, specializing in machine learning and deep learning applications in visual fields [3][4] - After completing his PhD, he worked at Microsoft Research as a principal researcher, leading projects in computer vision and multimodal intelligence [6][9] - Zhang has also held a part-time assistant professor position at the University of Washington since 2021, contributing to academic research alongside his industry roles [9] Group 2: Contributions at Meta - At Meta FAIR, Zhang led several groundbreaking projects, including the Segment Anything 3 (SAM 3) project, which provides a unified framework for object detection, segmentation, and tracking in images and videos [10][13] - He was also responsible for the Llama 3 and Llama 4 visual grounding projects, enhancing the models' capabilities in visual commonsense reasoning and complex scene understanding, significantly boosting Meta's generative AI competitiveness [13] Group 3: Industry Trends and Implications - Zhang's move to OpenAI is part of a broader trend where several high-profile researchers are transitioning to the company, driven by its advanced computational resources and foundational infrastructure for world modeling [16][17] - This shift suggests that OpenAI is making a significant investment in the "world model + physical intelligence" approach, which could lead to advancements in high-level robotic systems by 2026 [16][17]
SMCI vs. META: Which AI Infrastructure Stock Has an Edge Now?
ZACKS· 2026-01-21 17:11
Core Insights - Super Micro Computer (SMCI) and Meta Platforms (META) are key players in the AI infrastructure supply chain, with SMCI focusing on high-performance servers and META acting as a hyperscale consumer of AI compute [1][2] Group 1: SMCI Overview - SMCI provides end-to-end AI rack-scale systems that integrate compute, networking, storage, and liquid cooling for AI data centers, utilizing advanced chips from NVIDIA and AMD [3] - The company has introduced Data Center Building Block Solutions (DCBBS) to facilitate rapid scaling for AI data centers, which is gaining traction [4] - SMCI is expanding its production facilities globally, diversifying into client, edge, and consumer AI markets, and aims for $36 billion in revenues by fiscal 2026, reflecting a 64% year-over-year growth [5][6] Group 2: SMCI Challenges - Rapid expansion has led to inventory accumulation, with first-quarter fiscal 2026 closing inventory at $5.7 billion, up from $4.7 billion, and a cash conversion cycle increase from 96 days to 123 days [7] - The company reported negative free cash flow of $950 million for the first quarter of fiscal 2026, with earnings growth estimates revised downward [7][8] Group 3: META Overview - META is heavily investing in AI infrastructure, including custom chips and large clusters to support its applications, with 79% of its total expenses in 2024 directed towards data centers and technical infrastructure [9][10] - The company is developing custom chips for AI workloads and consolidating smaller models into larger, more efficient ones, with significant capital expenditures projected between $70-$72 billion for 2025 [11][12] Group 4: META Growth Projections - META's AI scaling efforts include the development of a one-gigawatt Prometheus cluster and a five-gigawatt Hyperion cluster expected to launch in 2028, with revenue and earnings growth estimates for 2026 at 18% and 31%, respectively [12] - Recent earnings estimates for META have been revised upward, indicating positive market sentiment [12] Group 5: Stock Performance and Valuation - Over the past six months, shares of SMCI and META have decreased by 37% and 14.3%, respectively [13] - SMCI is trading at a forward Price to Sales ratio of 0.46X, while META is at 6.42X, both below their historical medians [15] Group 6: Conclusion - SMCI is experiencing rapid growth driven by AI infrastructure demand but faces challenges with working capital intensity and negative cash flow [16] - META's long-term investments in AI infrastructure and improved technology position it favorably against SMCI, with both companies currently holding a Zacks Rank 3 (Hold) [16]
Meta's SAM 3: AI Vision just got a HUGE UPGRADE (FREE)
Matthew Berman· 2025-12-10 19:43
Meta just dropped SAM 3 that is segment anything model and it allows you to use simple text prompting to segment anything in a video easily. Let me take a step back. There's this thing called rotoscoping.It is the extremely manual process that takes a team of dozens of people by manually segmenting different elements in a video. And now with SAM 3, it takes seconds. I'm partnering with Meta on this video to tell you about this incredible open-source open weights model that allows you to do some pretty incre ...
分割一切并不够,还要3D重建一切,SAM 3D来了
具身智能之心· 2025-11-21 00:04
Core Viewpoint - Meta has launched significant updates with the introduction of SAM 3D and SAM 3, enhancing the understanding of images in 3D and providing advanced capabilities for object detection, segmentation, and tracking in images and videos [2][6][40]. Group 1: SAM 3D Overview - SAM 3D is the latest addition to the SAM series, featuring two models: SAM 3D Objects and SAM 3D Body, both demonstrating state-of-the-art performance in converting 2D images into detailed 3D reconstructions [2][4]. - SAM 3D Objects allows users to generate 3D models from a single image, overcoming limitations of traditional 3D modeling that often relies on isolated or synthetic data [11][15]. - Meta has annotated nearly 1 million real-world images, generating approximately 3.14 million 3D meshes, utilizing a scalable data engine to enhance the quality and quantity of 3D data [20][26]. Group 2: SAM 3D Body - SAM 3D Body focuses on accurate 3D human pose and shape reconstruction from single images, maintaining high-quality performance even in complex scenarios with occlusions and unusual poses [28][30]. - The model is interactive, allowing users to guide and control predictions, enhancing accuracy and usability [29]. - A high-quality training dataset of around 8 million images was created to improve the model's performance across various 3D benchmarks [33]. Group 3: SAM 3 Capabilities - SAM 3 introduces promptable concept segmentation, enabling the model to detect and segment specific concepts based on text or example image prompts, significantly improving its performance in concept recognition [40][42]. - The architecture of SAM 3 builds on previous advancements, utilizing components like the Meta Perception Encoder and DETR for enhanced image recognition and object detection capabilities [42][44]. - SAM 3 achieves a twofold increase in cgF1 scores for concept recognition and maintains near real-time performance for images with over 100 detection targets, completing inference in approximately 30 milliseconds on H200 GPUs [44].
AI视觉GPT时刻,Meta新模型一键“分割世界”,网友直呼太疯狂了
3 6 Ke· 2025-11-20 10:04
Core Insights - Meta has launched a new family of models called SAM 3D, which includes SAM 3D Objects for object and scene reconstruction and SAM 3D Body for human shape estimation [1][12] - The SAM 3D series allows users to extract 3D models from 2D images with high accuracy, enabling 360-degree rotation without noticeable flaws [1][11] - SAM 3 introduces a new feature called "promptable concept segmentation," enhancing the model's versatility in image segmentation tasks [1][19] SAM 3D Objects - SAM 3D Objects has achieved significant advancements in 3D object reconstruction, utilizing a data annotation engine that has labeled nearly one million images to generate over 3.14 million mesh models [7][9] - The model outperforms existing leading models in human preference tests with a 5:1 advantage, enabling near-real-time 3D applications [10][11] - SAM 3D Objects can reconstruct shapes, textures, and poses of objects, allowing users to manipulate the camera for different viewing angles [11][12] SAM 3D Body - SAM 3D Body focuses on human 3D reconstruction, accurately estimating human poses and shapes from single images, even in complex scenarios [12][13] - The model supports prompt inputs, allowing users to guide predictions through segmentation masks and key points, enhancing interactivity [12][13] - SAM 3D Body has been trained on approximately 8 million high-quality samples, ensuring robustness across diverse scenarios [13][16] SAM 3 Model Features - SAM 3 is a unified model capable of detecting, segmenting, and tracking objects based on text, example images, or visual prompts, significantly improving flexibility in segmentation tasks [18][19] - The model has shown a 100% improvement in concept segmentation performance on the SA-Co benchmark compared to previous models [19][20] - Meta has implemented a collaborative data engine involving both AI and human annotators to enhance data labeling efficiency and model performance [20][23] Conclusion - The rise of generative AI is transforming computer vision (CV) capabilities, expanding the boundaries of model training and data set creation [24] - Meta is actively applying these technologies in real business scenarios, suggesting that the SAM and SAM 3D series models may yield further innovations as data and user feedback accumulate [24]
Meta「分割一切」进入3D时代!图像分割结果直出3D,有遮挡也能复原
量子位· 2025-11-20 07:01
Core Viewpoint - Meta's new 3D modeling paradigm allows for direct conversion of image segmentation results into 3D models, enhancing the capabilities of 3D reconstruction from 2D images [1][4][8]. Summary by Sections 3D Reconstruction Models - Meta's MSL lab has released SAM 3D, which includes two models: SAM 3D Objects for object and scene reconstruction, and SAM 3D Body focused on human modeling [4][8]. - SAM 3D Objects can reconstruct 3D models and estimate object poses from a single natural image, overcoming challenges like occlusion and small objects [10][11]. - SAM 3D Objects outperforms existing methods, achieving a win rate at least five times higher than leading models in direct user comparisons [13][14]. Performance Metrics - SAM 3D Objects shows significant performance improvements in 3D shape and scene reconstruction, with metrics such as F1 score of 0.2339 and 3D IoU of 0.4254 [15]. - SAM 3D Body also achieves state-of-the-art (SOTA) results in human modeling, with MPJPE of 61.7 and PCK of 75.4 across various datasets [18]. Semantic Understanding - SAM 3 introduces a concept segmentation feature that allows for flexible object segmentation based on user-defined prompts, overcoming limitations of fixed label sets [21][23]. - The model can identify and segment objects based on textual descriptions or selected examples, significantly enhancing its usability [26][31]. Benchmarking and Results - SAM 3 has set new SOTA in promptable segmentation tasks, achieving an accuracy of 47.0% in zero-shot segmentation on the LVIS dataset, surpassing the previous SOTA of 38.5% [37]. - In the new SA-Co benchmark, SAM 3's performance is at least twice as strong as baseline methods [38]. Technical Architecture - SAM 3's architecture is built on a shared Perception Encoder, which improves consistency and efficiency in feature extraction for both detection and tracking tasks [41][43]. - The model employs a two-stage generative approach for SAM 3D Objects, utilizing a 1.2 billion parameter flow-matching transformer for geometric predictions [49][50]. - SAM 3D Body utilizes a unique Momentum Human Rig representation to decouple skeletal pose from body shape, enhancing detail in human modeling [55][60].
分割一切并不够,还要3D重建一切,SAM 3D来了
机器之心· 2025-11-20 02:07
Core Insights - Meta has launched significant updates with the introduction of SAM 3D and SAM 3, enhancing the understanding of images in 3D [1][2] Group 1: SAM 3D Overview - SAM 3D is the latest addition to the SAM series, featuring two models that convert static 2D images into detailed 3D reconstructions [2][5] - SAM 3D Objects focuses on object and scene reconstruction, while SAM 3D Body specializes in human shape and pose estimation [5][28] - Meta has made the model weights and inference code for SAM 3D and SAM 3 publicly available [7] Group 2: SAM 3D Objects - SAM 3D Objects introduces a novel technical approach for robust and realistic 3D reconstruction and object pose estimation from a single natural image [11] - The model can generate detailed 3D shapes, textures, and scene layouts from everyday photos, overcoming challenges like small objects and occlusions [12][13] - Meta has annotated nearly 1 million images, generating approximately 3.14 million 3D meshes, leveraging a scalable data engine for efficient data collection [17][22] Group 3: SAM 3D Body - SAM 3D Body addresses the challenge of accurate human 3D pose and shape reconstruction from a single image, even in complex scenarios [28] - The model supports interactive input, allowing users to guide and control predictions for improved accuracy [29] - A high-quality training dataset of around 8 million images was created to enhance the model's performance across various 3D benchmarks [31] Group 4: SAM 3 Capabilities - SAM 3 introduces promptable concept segmentation, enabling the model to identify and segment instances of specific concepts based on text or example images [35] - The architecture of SAM 3 builds on previous AI advancements, utilizing Meta Perception Encoder for enhanced image recognition and object detection [37] - SAM 3 has achieved a twofold improvement in concept segmentation performance compared to existing models, with rapid inference times even for images with numerous detection targets [39]
ICLR 2026惊现SAM 3,分割一切的下一步:让模型理解「概念」
具身智能之心· 2025-10-14 00:02
Core Viewpoint - The article discusses the release of the paper "SAM 3: Segment Anything with Concepts" by Meta, which introduces advancements in the field of computer vision, particularly in promptable concept segmentation [3][5][9]. Summary by Sections Introduction - The paper "SAM 3" has gained significant attention, suggesting it is a continuation of Meta's "Segment Anything" series, following the previous versions SAM 1 and SAM 2 [3][5][6]. Key Developments - SAM 3 introduces a new task called Promptable Concept Segmentation (PCS), allowing users to input text or image examples to predict instance and semantic masks for matching objects while maintaining identity consistency across video frames [9][17]. - The focus is on identifying atomic visual concepts, enabling the model to understand simple noun phrases like "red apple" or "striped cat" for segmentation [9][12]. Performance Improvements - SAM 3 shows significant performance improvements over SAM 2, achieving at least a 2x enhancement on the new benchmark SA-Co, with a zero-shot mask average precision of 47.0 on the LVIS dataset, surpassing the previous best of 38.5 [13][14]. - The model processes images with over 100 objects in just 30 milliseconds on a single H200 GPU [14]. Methodology - SAM 3 is built on a dual encoder-decoder transformer architecture, integrating a detector with a tracker and memory module for video applications [19]. - A scalable human-machine collaborative data engine was developed, annotating a high-quality training dataset with 4 million unique phrases and 520 million masks [20]. Benchmarking and Results - SAM 3 outperforms previous models in various benchmarks, including achieving a CGF score that is double that of the strongest baseline OWLv2 on the open vocabulary SA-Co/Gold dataset [28]. - In multiple public benchmarks, SAM 3 consistently exceeds the performance of strong expert baselines, demonstrating its effectiveness in instance segmentation and object detection tasks [27][30]. Conclusion - The advancements in SAM 3 position it as a leading model in the field of computer vision, particularly in the area of promptable segmentation, showcasing Meta's commitment to pushing the boundaries of AI technology [9][12][19].
ICLR神秘论文曝光,SAM3用「概念」看世界,重构视觉AI新范式
3 6 Ke· 2025-10-13 23:57
Core Insights - The upcoming upgrade of the SAM model, SAM 3, focuses on "concept-based segmentation," allowing for segmentation based on semantic concepts rather than just pixels or instances [6][8][15] - SAM 3 introduces a new standard called Promptable Concept Segmentation (PCS), enabling the model to identify and segment all objects that fit a given concept across various images and videos [8][12][16] - The model has been trained on a vast dataset, including approximately 4 million unique concept labels, enhancing its ability to understand and segment based on user prompts [6][11][27] Group 1: SAM 3 Features - SAM 3 emphasizes interactive refinement of segmentation results, allowing users to provide additional prompts to clarify ambiguous cases [8][11] - The model can track multiple instances of the same concept across different frames in a video, improving its utility in dynamic environments [8][12] - SAM 3 achieves significant performance improvements, with a zero-shot segmentation accuracy of 47.0 on the LVIS dataset, surpassing the previous best of 38.5 [11][28] Group 2: Data Engine and Training - A human-AI collaborative data engine has been developed to enhance the training process, allowing the model to learn from its mistakes and improve accuracy [19][22] - The data engine consists of four phases, starting with human validation and progressing to AI-assisted validation and video annotation [21][25] - The final dataset, SA-Co, includes 126,000 samples and 214,000 unique phrases, making it one of the largest open vocabulary segmentation datasets available [28] Group 3: Concept Segmentation Challenges - PCS faces challenges due to the vast range of possible concepts, leading to ambiguities that the model must navigate [14] - To address these ambiguities, SAM 3 employs multi-expert annotations and optimized evaluation protocols to ensure objectivity and accuracy [14][19] - The model includes a dedicated "ambiguity module" to help it understand and tolerate vague boundaries in concept definitions [14][19]